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Section 1 

INTRODUCTION AND SUMMARY 


This Interim Report documents the CARE- I II mathematical model and code 
verification performed by Boeing Computer Services from January 1982 
through November 1982. The mathematical model has been verified for per- 
manent and intermittent faults. The transient fault model has not been 
addressed. The code verification has been performed on CARE-III, Version 
3. A CARE-III Version 4, which corrects deficiencies identified in 
Version 3, is being developed as part of the overall study. 

1.1 BACKGROUND OF PROBLEM 

Fault- tolerant flight control systems (FTFCS) are designed to be ultra- 
reliable. Key modules are redundant to a level that makes the probability 
of failure due to spares exhaustion extremely small. These systems are 
designed to mask the faulty operation of a failed module until the system 
can successfully reconfigure with a spare. This masking of the faulty 
module to an observer outside the system comprises the fault-tolerance of 
the system. System failure due to improper masking is called a coverage 
failure. These systems are designed with sufficient redundancy such that 
coverage failure greatly dominates spares exhaustion as the mode of system 
failure. 

The reliability of any proposed FTFCS must obviously be demonstrated. 
Since the reliability needs to be extremely high, assessment of the relia- 
bility must come from engineering analysis and reliability modeling. 
Laboratory testing of a system with mean time between failure greater than 
10 6 hours is obviously not practical. 

For each proposed FTFCS, a reliability model and program could conceivably 
be developed. The alternative is to develop a general reliability program 
which permits the representation of systems with diverse architecture and 
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fault-masking techniques. Under NASA funding, Raytheon has been develop- 
ing such a program. CARE- III represents the current level of this sequen- 
tial development. 

1.2 CARE-III GENERAL APPROACH 

CARE- III is a reliability program which permits the evaluation of complex 
redundant fault-tolerant systems. The program was designed for the evalu- 
ation of FTFCS but is sufficiently general to permit, in principle, use 
for a wide variety of systems. This generality is discussed in more 
detail in Section 2.2. 

The reliability model which CARE-III addresses is a semi-Markov process 
with an unmanageably large number of states. Assumptions about the rela- 
tive size of the module failure and coverage parameters permit this de- 
tailed micro model to be approximated by a macro Markov model with a 
greatly reduced number of states. Furthermore, replacing detailed inform- 
ation contained in the micro model by probabilities of the corresponding 
events in the macro model permits the separation of the reliability model 
into a coverage model, which must be solved only once, and a reliability 
program which uses the coverage model output (see Section 2.3). 

1.3 OBJECTIVE OF THIS PROJECT 

The objectives of this project are the verification of the mathematical 
model and the computer code (Task 1) and the test stressing (Task 2) of 
CARE-III. This interim report addresses the results to date on Task 1. 
Additional Task 1 results, and all Task 2 findings will be addressed in 
the final report. 

Task 1 


In the mathematical model verification, equations are to be independently 
derived from the basic model. The solution approach implemented is to be 


2 



investigated with respect to accuracy and stability. Approximations used 
in the simplification of the model and the solution approach are to be 
reviewed and evaluated. 

In the computer code verification the program structure, algorithms and 
equations are to be reviewed. In the program structure review modularity, 
maintainability, internal structure logic and data storage are to be eval- 
uated. The choice and implementation of algorithms, for numerical solving 
or evaluating equations, are to be reviewed. Equations derived from the 
code are compared to those from the mathematical model. 

This verification process is intended to assure that CARE-III is a mathe- 
matically valid reliability tool for a well defined set of problems. 

1.4 QUESTIONS TO BE ADDRESSED 

During this investigation of CARE-III, a number of questions have arisen. 
Many of these have been resolved and are addressed in this interim report. 

• Documentation - The original theory document. Phase II, for CARE- 
III was inadequate for describing the model and program. This 
interim report is intended to fill some of that void. 

• Mathematical Model 

- The detailed stochastic model for the system represented 

- The simplified stochastic model approximating the detailed 
model 

- The derivation of transition rates 

- The solution approach (differential equation versus integral 
equation versus approximate integral equation solutions) 

- Coverage 

- Transient failure model 
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• Model Implemented in Code 


- Program architecture 

- Algorithms chosen 

- Approximations in solution implementation 

- Program efficiency 

- Reliability 

- System architecture for computing Q and P* 

- Sparing algorithm 

• Representation of FTFCS - A user's guide is needed which shows how 
the user may go from an understanding of a system (system struc- 
ture, error rates, detection isolation and reconfiguration rates) 
to representing the system in CARE- I II. 

- What systems may be represented by CARE-III? 

- How is software failure modeled? 

- How are stage and module dependencies handled? 

1.5 SUMMARY OF RESULTS 
1.5.1 General Comments 


During Task 1, Model Verification, the BCS team has extensively reviewed 
the CARE-III model, the CARE-III documentation and the CARE-III program 
(Version 3, 1982). We agree with a vast majority of the material evalu- 
ated. There are, however, several areas which we feel need either further 
development or reworking. 

The development of the CARE-III program shows an understanding of the 
basic requirements for a reliability program for a FTFCS. The basic 
structure chosen, a non-homogeneous semi -Markov process, appears to pro- 
vide a general structure which permits representing the operation of a 
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FTFCS (e.g., scheduled computations; sparing; majority voters; permanent, 
intermittent and transient faults; fault coverage). Due to the exces- 
sively large number of states in this model for any reasonable system, 
implementation is not feasible. Dr. Stiffler replaces this model with a 
much simpler approximation, a non-homogeneous Markov model with a greatly 
reduced number of states. Appropriate assumptions on relative transition 
rates permit the separate solution of the coverage and reliability models. 
The solution of the resultant model is feasible, and is implemented in the 
CARE- I I I program. The CARE-III code exhibits good program structure and 
organization. Comments within the program highlight the calculations 
performed in the various subroutines. For a reasonably large and complex 
program, over 4500 lines of FORTRAN code, relatively few coding errors 
were identified during the review. 

I. 5.2 Documentation 

The incomplete existing documentation for CARE-III poses a major problem 
for anyone interested in investigating or understanding the model and 
using the code. The underlying theoretical model, the solution approach, 
the implementation of the model into code, and the choice of algorithms 
implemented are not well documented in the Raytheon Phase II reports 
(Stiffler, J. J. and Bryant, L. A. (1982); Bryant, L. A. and Stiffler, 

J. J. (1982 a,b)). The intent of this report is to fill some of this 
void. 

For a FTFCS designer, the existing User's Manual, Bryant, L. A. and 
Stiffler, J. J. (1982b), does not provide sufficient guidance in the use 
of CARE-III. The principal problem of representation, transferring system 
design information into input parameters, is not addressed. The meaning 
and use of several input parameters are inadequately described. In order 
to make CARE-III a useful tool, this shortcoming must be remedied. 
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1.5.3 Theoretical Model 


The reliability model which CARE- III implements is an approximation to the 

detailed, but intractible, reliability model which better represents a 

FTFCS. The assumptions necessary for, and the limitations as a result of 

this approximation are not detailed in the CARE- III documentation. In the 

derivation of rates within the model, we take exception with several of 

the formulas. The b , as defined in the CARE- III documentation, and as 

xy 

implemented in code, lead to some questionable results. These terms, 

necessary for computing double-fault coverage probabilities, give differ- 
ent answers if one numbers modules from left to right or right to left. 
We believe the definition of b xy needs to be changed and have provided a 
solution (equations 3.4-11 and 3.4-22). 

The transient case appears to pose a problem for CARE- III. The most 
natural way to represent transient faults is through a reversible model. 
That is, transitions from 1 to i - 1 are possible. An irreversible 
model is, however, much easier to solve. CARE-III, an irreversible model, 
addresses transients by modifying the fault occurrence rate. This 
approach has led to computational problems in the code. We believe that 
these problems can be avoided by calculating the intensity of entry into 
the detected as permanent state instead of its probability. It should be 
noted that the formulas used for transient faults in the derivation of 
rates have not been validated and require further investigation. 

1.5.4 Model Implementation and Code 

The reliability model implemented in CARE-III is defined by a system of 
ordinary differential equations for the probabilities of the system to be 
in operational states (P's), coverage failure states (Q's) and exhaustion 
failure states (S's). The differential equations are not solved directly 
by a numerical integration method, but rather the integral solution of the 
equations is computed using numerical quadrature methods. BCS suggests 
that this decision should be reconsidered. To compute the solution, the 
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P's in the integral equations for Q and S are replaced by the perfect 
coverage probabilities (P*'s), which can be computed directly. The impact 
of this approximation on the estimation of the system reliability is not 
addressed in the CARE- III documentation or monitored in the program. 

The calculation of system reliability is partitioned into "SUBRUN's" which 
consist of the evaluation of the reliability of subsystems which are inde- 
pendent in the sense that modules in different subsystems are not criti- 
cally coupled. The calculation of system reliability from SUBRUN results 
appears to be in error and is under current investigation. 

1.5.5 Algorithms and Data Structures 


The solution of the CARE- III reliability model requires the implementation 
of algorithms for the numerical integration of a function and the numer- 
ical convolution of two functions. The quadrature rule used for numerical 
integration is Simpson's Rule, and it is adequately programmed in CARE- 
III. However, the stepsize for the integration is proportional to the 
flight time, since the array sizes for the reliability functions are 
fixed. This may degrade the accuracy of the solution for long flight 
times. The method of moments is used for numerical convolution of the 
module failure rate functions with the coverage failure rate functions. 
The implementation is based on the assumption that the coverage failure 
rate functions decay quickly to zero in the time scale of the module 
failure rate functions. BCS questions whether this assumption is valid in 
all cases; the resolution of the question is important because the con- 
volution is the vital link between the coverage and reliability models. 

The solution of the CARE-III coverage model requires the implementation of 
algorithms for the numerical sum of two functions, the numerical integra- 
tion of a function, the numerical convolution of two functions and the 
numerical solution of Volterra integral equations of the second kind. The 
procedure for computing the numerical sum, Simpson's rule for numerical 
integration and the Trapezoidal rule for numerical convolution are ade- 
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quately programmed in CARE-III. The procedure for solving Vol terra inte- 
gral equations is based on linear interpolation and the numerical convolu- 
tion algorithm. Although the procedure is adequately programmed in CARE- 
III, BCS has questioned its numerical stability; this is a subject of cur- 
rent study. 

All the algorithms for the coverage model are closely tied to a CARE-III 
data structure, which permits only doublings of the discrete stepsize. 
This is based on the expectation that all coverage functions are exponen- 
tially decaying, positive functions. The heuristics in the numerical sum, 
convolution and Vol terra algorithms indicate that not all coverage func- 
tions meet the expectation. More general algorithms and data structures 
may improve the computations in the coverage model. 
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Section 2 

OVERVIEW OF CARE-III MODEL 


The reliability model implemented in CARE-III is designed to assess the 
reliability of complex, redundant, fault- tolerant systems such as FTFCS's; 
requirements for the model are briefly discussed in Section 2.1. As 
pointed out in Section 2.2, the detailed model used in CARE-III requires 
the solution of a semi-Markov process with an excessively large number of 
states. An aggregation procedure, discussed in Section 2.3, is used in 
CARE-III to reduce the solution task to the solution of an aggregate 
reliability model and a coverage model. 

The aggregate reliability model is briefly reviewed in Section 2.4 below 
and is then discussed in detail in Section 3; the theory and derivation of 
the model is given in Sections 3.1 to 3.4 and its implementation in CARE- 
III is described in Sections 3.5 to 3.6. The coverage model is briefly 
reviewed in Section 2.5 below and is then discussed in detail in Section 
4; the theory and derivation of the model is given in Sections 4.1 to 4.2 
and its implementation in CARE-III is described in Sections 4.3 to 4.4. 
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2.1 REQUIREMENTS FOR MODELING FTFCS's 

In Phase I of the research conducted by the Raytheon Company, several 
FTFCS's were studied to determine requirements for the CARE- III 
reliability model; these included SIFT, FTMP, ARCS and FTSC. The require- 
ments established by the Raytheon researchers, J. J. Stiff ler, L. A. 
Bryant, L. Guccione (1979) pp. 12-16, are quoted below; 

• "Capability of modeling up to at least 40 stages" 

• "Multiple operating modes for each set of coupled stages" 

• "Separate coverage model similar to that in CARE-II but capable of 
handling latent and intermittent faults as well as permanent 
faults" 

• "Multiple success criteria" 

• "N-point failure mechanisms" 

• "Time-dependent hazard rates" 

• "Transient faults" 

• "Non-unity dormancy factors" 
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2.2 FORMULATION OF CARE-II I MODEL 


In the usual reliability analysis of a system, the system is operational 
if at least a specified set of modules is operational. In a FTFCS, the 
system is operational if at least a specified set of modules is opera- 
tional, and the faulty modules have not caused a faulty operation of the 
system before they are detected, isolated, and replaced. The masking of 
the faulty operation of a module before it is replaced is called fault 
coverage and is part of the fault tolerance of the system. For a FTFCS 
the reliability of the system is the probability that at least a specified 
set of modules is operational and that all faulty modules have been cover- 
ed. 


2.2.1 Modules and Stages 

The basic units represented in CARE-II I are modules (e.g., processor, 
memory, bus); groups of identical modules are called stages. A stage is 
operational if at least a specified number of modules in that stage are 
operational . 

For stage x, define 

n(x) = number of stage x modules, 

m(x) = minimum number of x modules necessary for stage operation, 
/(x) = number of failed stage x modules. 

Then, assuming perfect coverage and independence of modules, the 
probability of stage x being operational is a sum of binomial probabili- 
ties (see Section 3.5.4). 

2.2.2 Exhaustion 

The system is composed of N independent stages. Assuming perfect 
coverage, the system is in either an operational or a failed state based 
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on the state of each of the N stages and the system logic. Module status 
is used only in determining the state of the stage. 

System failure caused by stage failure(s) is termed failure by exhaustion. 
This is best illustrated by example. Consider a system composed of two 
stages, x and y. This system is in an operational state if either x or y 
are operational. Then the reliability R of the system, using the 
independence of stages, is 


R = P (system is operational) 

= 1 - P (system failed) 

= 1 - P (stage x failed and stage y failed) 

= 1 - P (stage x failed) • P (stage y failed) 

2.2.3 Coverage 

For a FTFCS, the operational status of the stages, together with system 
architecture is insufficient to determine if the system is operational. 
The status of each failed module must be known, together with some system 
logic, to determine whether a failed module, or a pair of failed modules, 
have caused propagation of an error and system failure before the failed 
modules are detected, isolated, and replaced by spare modules. System 
failure due to improper error masking is termed a coverage failure. 

In the ultra-reliable FTFCS imagined for future aircraft, coverage failure 
is presumed to be the dominant mode of failure. To address this failure 
mode a detailed model is necessary to carry the relevant information on 
the status of the failed modules. 

A common fault tolerant technique to mask errors is to use triplexes with 
majority voting. Three identical modules perform the same task. If a 
single module is faulty, the voter should mask the error. Failure of the 
voter to mask the error to the system is a single fault coverage (system) 
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failure. If two modules in the triplex are faulty, the majority voter 
can't mask the error. These modules are said to be critically coupled. 
There may be spare stage modules available, but the existence of two 
faulty modules in the triplex brings the system down before the faulty 
modules may be replaced. This is a double fault coverage (system) 
failure. 

The coverage problem for a module is represented by a multi-state semi- 
Markov model discussed in Sections 2.5, 3.1 and 4.2. The coverage model 
addresses both single and double fault failure. Higher order failure com- 
binations are not addressed by CARE III. In a pentaplex system with 
majority, 3-out-of-5, voting critical triples need to be represented. 

2.2.4 Fault Categories 

A module may be susceptible to more than one mode of failure, each mode 
having its own occurrence rate. These competing risks on the module may 
have different coverage parameters. This level of detail is included in 
the CARE- III model and discussed in Sections 2.5, 3.1 and 4.1. 

2.2.5 State Space 

Given the system, stage, and coverage structure, the state of the system 
is determined by the status of the stages and the faulty modules. The 

module information must include which modules have failed, fault category, 
and coverage state. Let (x,a) denote the a-th module in stage x. The 

fault information for (x,a) is completely specified as below by the vector 
d(x,a), i(x,a), c(x,a), up to sparing. A representation of sparing could 
be accomplished by the inclusion of an additional indicator variable. 


d(x,a) = 

0 if (x,a) operational, 

1 if (x,a) faulty, 

i(x,a) = 

Fault category (0 to 5), 
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c(x,a) 


Coverage state of module (A, B,...). 


Let M denote the number of modules in the system, then 
N 

M = 12 n(x) , 

x=l' 

and the state of the system is completely specified by the three M dimen- 
sional vectors: 

d = (d(l,l) d(N,n(N) ) , 

i = (1(1,1),..., i(N,n(N)), 

c = (c(l,l),..., c(N,n(N)). 

If there are only two stages (N = 2), with three modules per stage (n ( 1 ) = 
n(2) = 3), three fault types, and five coverage states, the number of (^, 
i, c) states is 


7.29 x 10 8 = 2 6 * 3 6 ‘ 5 6 . 

Not all of these states are possible. If d = 0, then i = c = 0. This 
reduces the number of states to 

1.68 x 10 7 = (3-5 + l) 6 . 

This number can be further reduced, since not all these stages are 
possible; yet the number of states remaining is still huge. Given the 
failure rates for the fault types, the coverage parameters, and the 
system, stage and coverage structure, the stochastic model is fully 
specified. However, solving the resulting integral equations for state 
probabilities for even this simple system is not feasible. For any system 
of moderate size and complexity, N will be much larger, as will be n(x), 
and the number of states increases exponentially. Some solution approach 
is required which reduces the dimensionality of the problem. 
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2.3 SOLUTION OF CARE-III MODEL 


An approximate solution of the detailed reliability model for CARE-III is 
obtained by solving a reduced order reliability model that is constructed 
by an aggregation procedure. The states of the system are grouped into 

aggregate states defined by the following system data: 

• Number of faulty modules in each stage, 

• System fault tree, 

• Coverage structure, 

t Critical pairs fault trees. 

Rates for transitions between aggregate states are defined as aggregates 
of the rates for transitions between detailed states in the aggregate 
states. Approximate values for these rates are defined by an averaging 
procedure that decomposes the coverage and reliability calculations. The 
resulting reduced order reliability model is still a semi-Markov process; 
however, it is approximately a Markov process under the assumption that: 

• The coverage rates are much greater than the module failure rates. 

The aggregation procedure thus decomposes the solution of the detailed 
reliability model into the solution of two problems of lower dimension: 

• Coverage Model; 

semi -Markov-process, and 

• Reliability Model; 

non-homogenous, Markov-process. 

These models are discussed briefly in Sections 2.4 and 2.5 below and then 
in detail in Sections 3.1 - 3.4 and 4.1 - 4.2; the rest of this section 
outlines the construction of the aggregate state space. 
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The aggregate states are indexed by "fault vectors" which specify the 
number of faulty modules in each stage: 

= jj_ = (f (1), H 2),... i ( N) ) : 0 < Je (x) < n(x) , 1 < x < n| 

For a particular fault vector, stage x has failed by spares exhaustion if 
i (x) > n(x) - m(x); the system has failed by spares exhaustion if the 
system fault tree specifies that the set of failed stages for l is a 
system failure. This decomposes the set of fault vectors into two sets: 

<£= l u r. 


where the system is operational for fault vectors in L and failed for 
fault vectors in T. 

The aggregate states shown in Figure 2.3-1 are defined as follows: 


» I 6 L; 

H (J_) 
• J_ e L; 
G(J_) 


= | (d, i_,c) : E<1M = i'(x). 1 <x<n|, 


: XI d(x,a) = i(x), l<x<N , 
a 

and c does not specify any single or double fault 
failures. 


F(J_) = ( (!>!>£) : E<fM = i(x), l<x<N , 


and £ specifies at least a single or double fault 
failure. 
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OPERATIONAL STATES 



FAILURE DUE TO COVERAGE 


Figure 2.3-1 General Structure of CARE III Aggregate Model 



The criteria for including (<U,£) in G(J_) or F( l ) based on £ is 
explained in detail in Sections 3. 1.3-4 where the single and double fault 
coverage models are described. 

The possible transitions between the aggregate states are illustrated in 
Figure 2.3-2; J_ is an arbitrary fault vector and J_ + Uy) differs from 
£ only in having one more fault in stage-y: 

• Case (a) : G(J_) to F(_i_) 

In case (a), j_ e L and the only transition is from G( l ) to 
F(j_) due to single fault coverage failure. 

• Case (b) : G(J_) to G(j_+l(y)) or F(j_+£(y)) 

In case (b), _£_ and _£ + l^(y} e L and the possible transitions are 
from G(j_) to G( l+l(y)) or from G(J_) to F(j_+l(y)) due to a 
double fault coverage failure. 

• Case (c) : G(j_) to H(j_+l(y)) 

In case (c), J_ e L and + ljy) e L and the only transition is 
from G(_£) to H ( £ +l(y)) due to system failure by exhaustion. 

• Case (d) : H(j_) to H(_i_+l_(y)) 

In case (d), J_ and J_ + ljy) e L and the only transition is from 
H(j_) to H( £ +l(y)). These transitions are included in the model 
since the H(j_) states are not treated as absorbing states. 

No transitions from the F(_£_) states are defined in the model since the 
F( £ ) states are treated as absorbing states. 

A complete description of the aggregate reliability model is given in 
Section 3.2 and the calculation of the rates for transitions between 
aggregate states is presented in Section 3.4. The discussion illustrates 
how the aggregation procedure decomposes the solution of the high order 
detailed reliability model for CARE- III into the solution of the low order 
coverage and aggregate reliability models. 
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Case (a): £ € L 


G(£+l(y)) 


[F(£+l(y))l 


Case (b): £, £ + Ijy) € L 


H(i+l(y))J 


Case (c): £_ e L, £ + l(y) e L 


lH(l+I(y)) 


Case (d): £., £ + l(y) e L 


Figure 2.3-2 Transition Between Aggregate States 
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2.4 RELIABILITY MODEL 


The reliability of the system at time t, R(t), is given by 

R(t) = P (system in state G (J_) at time t, _l_ e L) 
and is computed by 

R(t) = 1 - P (system in state F(_i_) at t, J_ e L) - 

P (system in state H { J) at t, J_ e L) 

= 1 - S P(F(J?J at t) - ]C_ P(H(J_) at t) 

l_ e L J_ e L 

= 1 - 2 Q(t|j.) - S. S(t|JJ 

e L ie L 

A system fault tree specifies the system structure in terms of the stages. 
These failure paths, together with the vector ( n(x) - m(x) ), determine the 
sets L and L for which the probabilities Q(t | J_) and S(t | l ) must be 
computed (see Sections 2.3 and 3.5.2). 

Consider a system with two stages. Let 

n(l) = n(2) = 3 
m(l) = m(2) = 2 

Then each stage may experience a single fault and still operate. If the 
stages are in series, as described by the system fault tree, the Z(l), 
l (2) space is partitioned with respect to Q and S computation as shown in 
Figure 2.4-1. 

If the stages are in parallel, the l[ 1), /( 2) space is partitioned as 

shown in Figure 2.4-2. 
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To obtain S(tl_j_) and Q(t|j_) one must solve the integral equations since 
the process is semi -Markov. The approximation of this process by a Markov 
process, as discussed in Sections 2.3 and 3.2.4 permits one to obtain 
S(t|j_) and Q(t|j_) from the differential equations (3.2-1 to 3.2-6). 
Solution of these differential equations by direct numerical integration 
was rejected by the CARE- III developers after considering a simple 
difference equation. (Other numerical integration procedures are 
available and should be considered.) The differential equations are 
solved in CARE- III by using numerical quadrature methods to evaluate the 
formal integral solution of the equations. 

CARE-III does not solve the differential equations for the S(t|j_). Under 
the assumption that the coverage transition rates are much larger than the 
failure rates, perfect coverage probabilities P*(t| / ) are solved for 
(equation 3.3-1), where P*(tl_z_) is just the product of the N binomial 
probabilities for stage x to have /(x) failures (3.3-2). The assumption 
of relative rates suggests that the P*(t|j_) should approximate S(t|J_) 
and errs on the conservative side, overestimating failure probabilities. 
In the forward integral equations for Q(t|j_), P(t|j_) is replaced by 
P*(t|_/J to yield an approximate solution for Q(t|j_) (equation 3.3-4). 
This again errs on the conservative side, overestimating Q(t|j_). Thus 
the reliability R(t) is underestimated. The error in these approximations 
for extremely small probabilities is under current investigation. 

2.5 COVERAGE MODEL 

Each module may be subject to several failure modes, each with a different 
rate of occurrence. The coverage model represents probabilistically the 
fault tolerance part of the system. Different types of faults may have 
different probabilities of occurrence and different probabilistic coverage 
models. 
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Modules are classified as either fault-free or faulty (Figure 2.5-1). 
Modules are also classified as being either in-use, spare or isolated. A 
module has a latent failure if the module is faulty and in-use or spare. 
The coverage model addresses the ability of the system to survive latent 
in-use failures, the object of the fault tolerance of the system. 

There are nine states in the coverage model for a single module, as shown 
in Figure 2.5-2. The states are: 

A (active) - module capable of producing an error 

Ag (active error) - module producing error (s) 

B (benign) - module has not produced errors and is 

currently not faulty 

B e (benign error) - module has produced error (s) but is 

currently not faulty 

F (failed) - module has caused system failure 

A Q (active detected) - module in active state has been detect- 

ed as faulty 

B q (benign detected) - module in benign state has been detect- 

ed as faulty 

DP (detected as permanent) - module has been detected as having a 

permanent or intermittent fault and has 
been isolated from the system. 
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FAULT-FREE 


FAULTY 


IN-USE 
SPARES 
ISOLATED 

Module Classes 

Fault-free : have not experienced a fault 

Faulty : have experienced a fault 

In-use : active part of system 

Spare : ready for, but not currently a part of active 

system 

Isolated : deleted permanently from the active system. 


Figure 2.5-1 Module Classification 
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ACTIVE 
B: BENIGN 

D: DETECTED 

E: ERROR 

F: FAILURE 

DP: DETECTED AS PERMANENT (NON-TRANSIENT) 


Figure 2.5-2 Single Fault Coverage Model 


Three basic fault types are represented by the coverage model: permanent, 

intermittent and transient faults. A permanent fault. Figure 2.5-3, re- 
mains faulty and is either eventually detected and isolated (DP) or causes 
system failure (F), the only absorbing states. An intermittent fault, as 
shown by Figure 2.5-2, oscillates between the active and benign states (A 
and B or A^ and B^) before it is isolated (DP) or causes system failure 
(F). A transient fault. Figure 2.5-4, is a temporary fault for a module. 
The module may cause system failure (F), be incorrectly identified as a 
permanent failure and isolated (DP), or become error free (B). 

Two types of coverage failures are represented in CARE- III. A single 
fault coverage failure occurs when a faulty module propagates an error be- 
fore the module is detected and isolated. A double fault coverage failure 
occurs when a pair of in-use modules with latent faults exist which to- 
gether cause system failure. The double fault coverage model may be look- 
ed at as a combination of two single fault coverage models. Of the 
9X9 = 81 potential states, various combinations of single fault non- 
failure states are double fault system failure states (AA, AB £ and A^B). 
Some combinations are not possible, while others revert to the single 
fault coverage model. 

When specifying fault modes for a stage, the specification of the transi- 
tion rates between the coverage states defines the fault type. The 
critically coupled modules, within or between stages, which may cause a 
double fault failure are user specified by a critical pairs fault-tree. 
The single and double fault models are discussed in Sections 3.1.3 and 
3.1.4. Derivation of the coverage rates for the reliability equations are 
given in Section 4. 




A: ACTIVE 
B: BENIGN 

D: DETECTED 

E: ERROR 

F: FAILURE 

DP: DETECTED AS PERMANENT 


Figure 2.5-4 Single Transient Fault Coverage Model 
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Section 3 
RELIABILITY MODEL 

The theory and implementation of the CARE- III reliability model is 
described in this section. The mathematical details of the reliability 
model have been extracted from the CARE-III documentation, J. J. 
Stiff ler, L. A. Bryant and L. Guccione (1979), J. J. Stiff ler, and L. A. 
Bryant (1982), J. J. Stiffler, J. S. Neumann and L. A. Bryant (1982) and 
the CARE-III program (Version-3, 1982). In those areas where the documen- 
tation is vague or incomplete, BCS has completed the model specifications 
based on its understanding of the applicable reliability methods. 

Section 3.1 presents a specification of the detailed CARE-III reliability 
model including the concepts of spares exhaustion, and single or double 
fault failures; the model is characterized by a detailed state space 
model. The solution of the CARE-III model is described in Sections 3.2 to 
3.4; first the aggregation procedure used to reduce the order of the reli- 
ability model is presented in Section 3.2, then the solution procedure 
used to solve the model is given in Section 3.3, and finally, the calcula- 
tion of the transition rates for the model is described in Section 3.4. 
The implementation of the CARE-III model is described in Sections 3.5 to 
3.6; first an overview of the CARE3 program is presented in Section 3.5, 
and then the computational methods used in CARE3 are highlighted in 
Section 3.6. 

Any discrepencies between the CARE-III documentation or the CARE3 program 
and the reliability model, as found by BCS during the Task 1 review, are 
pointed out in the discussion in Section 3. 
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3.1 DETAILED RELIABILITY MODEL 


3.1.1 Model Specifications 

The objective of the model is to calculate the reliability of a complex, 
redundant Fault Tolerant System. Such highly reliable systems can fail 
due to exhaustion of adequate resources, but the dominant cause of failure 
tends to be failure to detect and isolate malfunctioning elements - 
coverage failures. 

The system modeled by CARE III consists of a number of stages (up to 70), 
and each of these is composed of one or more identical interchangeable 
modules. 

Recall the notation introduced in Section 2.2 where for stage-x, 

(x,a) = a-th module in stage-x, 
n(x) = number of modules in stage-x, 

m(x) = minimum number of modules necessary for stage-x to be oper- 
ational. 

Furthermore, for each stage-x, the vector NOP(x) indicates the number of 
modules in-use as a function of the number of latent and fault free 
modules. NOP(x) is a vector of integers (q(l,x), q(2,x),...) with n(x) > 
q(i,x) > q(i+l,x) > m(x). If s modules have been deleted, and q(i-l,x) 
> n(x) - s >q(1,x) (q(o,x) = n(x)+l), then q(i,x) modules are in-use, 
and n(x) - s - q(i,x) modules are treated as operational spares. 

Figure 2.5-1 shows the relationships between faulty, latent, in-use and 
spare modules. 
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In the present model each stage-x module can suffer from one of several . 
categories of faults (up to 5). The j-th fault category for a stage-x 
module is denoted by x. and is defined by its rate of occurrence, 

J 

GJ(X,)-1 

X(t I Xj) = \(Xj) co(Xj)t J 

and the fault coverage parameters (which characterizes whether the fault 
is permanent, intermittent or transient, its detection and isolation 
schedules, etc.). 

A permanent or intermittent fault is said to be latent from the time it 
first occurs until it is detected and isolated from the system. A tran- 
sient fault is said to be latent from the time it first occurs until it is 
either detected and isolated or reaches a benign state. 

As discussed in Section 2.2 there are several causes of coverage failure: 

Single Fault Failure 

Cl. An existing latent fault causes the system to take some unacceptable 
action, 

Double Fault Failures 

C2. A new fault occurs which, in combination with an existing latent 
fault, prevents the system from functioning properly; 

C3. A pair of existing latent faults for the first time reaches a system- 
disabling state. 

The analysis of the first cause of coverage failure and of the latency 
period of a fault is based on the Single Fault Coverage Model described in 
Section 3.1.3. 


31 



The last two causes of failure are analyzed in Section 3.1.4 and are ap- 
plicable only for the case of interacting modules, i.e., critical pairs of 
modules. The set of such pairs denoted by CP depends on the architecture 
of the system and is defined by a Critical Pairs Fault Tree. 

3.1.2 Spares Exhaustion Failure 

The state of the system is represented by the vector l = (/( 1), 
i( 2),..., Z(x),...), where /(x) indicates the number of stage-x modules 
that have experienced a fault. Each stage-x is said to have failed (due 
to spares exhaustion) if the number of operational modules falls below the 
allowed minimum, i.e., n(x) - Z(x) is less than m(x). System failure due 
to exhaustion of spares is then defined by a combination of stage failures 
introduced by a System Fault Tree. The set of spares exhaustion states is 
denoted by L. 

3.1.3 Single Fault Coverage Failure 

The Single Fault Coverage Model, SFCM, defines the coverage structure and 
helps analyze the latency period of a fault. The SFCM is shown in Figure 
3.1-1, and its dynamics are described in what follows. 

When a fault first occurs, it is said to be in the active state A (see 
Figure 3.1-1). If the fault is transient or intermittent, it may jump 
from the active to the benign state B. These transitions take place at a 
constant rate ot; for permanent faults, a = 0. If the fault is intermit- 
tent, the reverse, benign-to-active, transition takes place at some con- 
stant rate ; for transient faults, cif 0 and /? = 0. In the benign 
state, the fault is incapable of causing any discernable malfunction. 
Thus, it can neither be detected nor can it produce erroneous output. In 
the active state, however, the fault is both detectable and capable of 
producing incorrect output. The rates at which either of these events 
take place depend upon the operating environment and, in particular, on 
how frequently and how often the faulty element is exercised in a way that 


32 



A: ACTIVE 
B: BENIGN 

D: DETECTED 

E: ERROR 

F: FAILURE 

DP: DETECTED AS PERMANENT (NON-TRANSIENT) 


Figure 3.1-1 Single Fault Coverage Model 
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causes the defect to manifest itself. If the fault is detected the system 
enters the active-detected state Ap, and if it produces an error it enters 
the active-error state A^. These transitions occur at time t, as measured 
from last entry into state A, according to the probability density func- 
tions 6 (t) and p(t) respectively. Once the system is in the active- 
error state A e> if the fault is either intermittent or transient, it may 
jump to the benign state. The error is still present so the state is 
designated the benign-error state B g . The composition of the two error 
states, Aj; and B^, is denoted the error state E. When the faulty element 
is in the error state, it jumps t time units after entry into A^, and 
according to probability density function e (t), to some point in the 
system. At this point the error is either detected or else propagates 
resulting in system failure, i.e., enters state F (coverage failure Cl). 
The probabilities of these two alternatives are C and 1-C, respectively. 
If the fault is detected, either through testing or through the detection 
of an erroneous output, the faulty element enters the active-detected 
state Aq or benign-detected state Bp, depending on the state of the fault 
when it was detected. At that time a decision is made as to whether the 
faulty element is to be retired from the system or whether it can continue 
to be used. This latter decision might be made, for example, if the fault 
recovery procedure included a diagnostic routine designed to distinguish 
between permanent and transient faults. If the fault is detected in the 
active state, the decision is made with probability that the element 
must be retired from service; if it is detected in the benign state, the 
same decision is made with probability P g . In either case the process 
jumps to detected as permanent state DP. Thus, with probabilities 1-P^ 
and 1-P g , respectively, the faulty element is returned to service 
following the detection of the fault. (The dashed lines in Figure 3.1-1 

indicate that the transition takes place immediately with the probability 
indicated. ) 

The model assumes that the effect of a decision that the fault is 
transient is to eliminate the error, if an error had already been produc- 
ed, and to return the faulty element to the error-free, active or benign 
state, depending on its state when the fault was detected. If the fault 
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was transient and detected in the benign state, it returns to the error- 
free benign state. Since = 0, it can never again become active so it 
ceases to pose any further threat to the system. If the fault is 
transient and detected as transient in the active state, it remains latent 
and may have another chance to cause system failure. If the fault is 
permanent or intermittent and detected as transient in either detected 
state, Ap or Bq, it remains latent and may have another chance to cause 
system failure. 

In summary, in the SFCM, 

• the states are: 


A active 

B benign 

Ap active detected 

Bp benign detected 

A^ active error 

B £ benign error 

E error (combination of A £ and B^) 

DP detected as permanent 

F failure 

• the parameters for the transitions are: 

a transition rate from A to B or from A E to B E 

/? transition rate from B to A or from B £ to A^ 

6 (t) p.d.f . for transition from A to A Q , where t is measured 
from time of last entry into A 

p (t) p.d.f. for transition from A to A £ , where t is measured 
from time of last entry into A 

e (t) p.d.f. for transition out of E, where t is measured from 
time of last entry into E 

C probability that a propagated error is detected before 
it causes the system to malfunction 
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P A probability that a fault detected in the active state is 
detected as permanent 

Pg probability that a fault detected in the benign state is 
detected as permanent 

In the present version of CARE III, the functions 6(t), p(t) and e (t) 
are restricted to be either exponential or uniform densities, i.e., either 
of the form 


0exp(- 0t) , t > o, 
or $ , o < t < I/O , 

for some constant 0 . However, the results obtained in Sections 3.2, 3.3, 

3.4 and 4.2 are valid even when these functions are arbitrary densities 
with support on the positive time axis. 

The transitions in this process occur either at constant rates, instanta- 
neously or according to some density functions. The parameters that 
govern these transitions are independent of the time at which they occur: 
time homogeneous process. The transitions governed by the densities 
$(t), P (t) and e (t), are assumed to be independent of past dynamics 
given that time t is measured from the time of last entry into states A or 
E: Markov property at jump times. It follows then that the Single-Fault 
Coverage Model satisfies the conditions of a Semi-Markov process (see 
Appendix A). If all three densities 6 (t), p (t) and e (t) are exponen- 
tial, then the SFCM is a Markov process. In Section 4.2 properties of 
such processes are used to calculate the state probabilities and intensi- 
ties of entry that are needed to solve the reliability problem. 

3.1.4 Double Fault Coverage Failure 

The dynamics that lead to failures due to interacting modules can be based 
on the corresponding pair of Single Fault Coverage Models, and by de- 
termining if, and when, the two independent fault states form some lethal 
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combination. CARE III takes a simplified conservative approach described 
in the following paragraph and summarized in Table 3.1-1. 

When the second module experiences a fault, the first module can be in any 
coverage state. If the first module is either active A or in error E the 
system is assumed to fail immediately (coverage failure C2); if the first 
module created errors that escaped undetected (F) the system has already 
failed and all future analysis is irrelevant; if the first module has been 
deleted from the system (DP) future dynamics of the system are independent 
from it. If the first module is benign B, the system enters the Double- 
Fault Coverage Model as described in the following paragraphs. 

Double Fault Coverage Model 


When the first module is in the benign B state the analysis is based on 
the Double Fault Coverage Model, DFCM, which corresponds to a simplified 
version of a combination of the corresponding single models. The DFCM is 
shown in Figure 3.1-2. 

The DFCM is entered according to a rate that depends on the first fault 
being benign and on the rate of occurrence of the second fault. Such a 
situation places the fault-pair in the B^A 2 state (first fault benign, 
second fault active.). From there, the fault-pair can go to the BjB 2 
state (both faults benign) if the second fault becomes benign before the 
first fault becomes active, to the state B^Dg if the active fault is 
detected or to the failed state DF (double fault) if the first fault be- 
comes active with the second fault still also in the active state or if 
the second fault causes an error to be produced (coverage failure C3). 
From the state B^D^ it can either go to B^DP 2 or return to B-jA 2 depending 
on whether the second is detected as permanent or not. In the state BjDP 2 
the second module is deleted from the system. If the first module is 
interacting with some other latent module, it continues to be analyzed 
within the context of the DFCM. If not, it is analyzed in the context of 
the SFCM. 
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TABLE 3.1-1 

COVERAGE FAILURES FOR CRITICAL PAIRS 


Coverage state of 



first module when 



second fault becomes 


Future 

active 

Consequence 

analysis 


A, A e or B £ 

Iimiediate 

failure 


F 

System already 
failed 

— 

DP 

First module 

Independent 


deleted from 

of first 


system 

module 

B 

Interaction 

Enters 


of faults 

DFCM 


Type of 
coverage 
Failure 

C2 

Cl 

Possible C3 
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Figure 3.1-2 Double Fault Coverage Model 



Since both faults are benign in the B^B 2 state, the only possible transi- 
tions from that state are back to the A 2 B^ state or to the A^B 2 state 
(first fault active, second fault benign) with its entirely analogous 
transitions. 

The argument used for the SFCM shows that the Double Fault Coverage Model 
is a Semi-Markov process. The intensity entry into state DF characterizes 
coverage failures of the third type, C3, and its formula is derived in 
Section 4.2. 

3.1.5 State Space Definition 

The state of the system is determined at each time by the status of each 
module: 

• A fault has occurred or not; 

• Category that caused the fault (if fault occurred); 

-• Coverage fault status (if fault occurred): active, benign, detect- 
ed, etc; 

• Spare status of a latent or fault free module. 

Analytically the states are described by the three M-dimensional vectors 
^,2 and £, which have been defined in Section 2.2. The information given 
by these vectors corresponds respectively to occurrence of faults, fault 
categories and coverage status. 

Part of the information given by the triple (d.,i_,£) is summarized in the 
vectors J_ = ( i (1), i (2),..., i (x),...) and - ( /l ( 1) , //(2),..., /*(x), 
...) where i (x) and H (x) denote the numbers of stage-x faulty and latent 
modules, respectively. The triple (d.,i.,c) gives no information on which 
latent and fault free modules are in-use and which are spare. This in- 
formation shall be assumed implicit, and will be used in the classifica- 
tion of states, description of transitions and evaluation of aggregate 
rates. 
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Classification of States 


The states of the model can be classified according to the failure status 
of the system: 

• The set of spares exhaustion states is denoted by U and is defined 
as some combination of unions and/or intersections of sets of the 
form 

|(d,i,c) J n(x) - *(x) < m(x)J . 

This last set corresponds to all states for which stage-x has 
fewer than m(x) operational modules (i.e., "failure" of stage-x). 
Such a (d.,i,c) state shall be said to be an H state. 

• L denotes the complement of L. 

A state (d,i_,c) in L, although not defining failure due to spares exhaus- 
tion, can represent a case of coverage failure. To isolate such cases 
consider the latent in-use modules determined by (d,i_,c), i.e., non-delet- 
ed non-spare faulty modules. Of these consider all possible latent 
critical pairs and all other latent modules (called single modules here- 
after) . 

The state (d»l,c) is an F (failure) state if either there is a latent 
single module that created an undetected error (i.e., c-component is F), 
or there is a latent critical pair in state DF, as given in Figure 3.1-2. 

If none of the above conditions are satisfied, the state (d,!,c) is called 
a G (operational) state. 
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3.1.6 Stochastic Characteristics of the Model 


The stochastic model for this state space is characterized by a mixture of 

• Time dependent rates for occurrence of faults; 

• Semi-Markov processes on coverage. dynamics. 

Backward integral equations, similar to those given in the Appendix, could 
be used to calculate the probabilities that the system is in an opera- 
tional state. The reliability of the system would then obtained by adding 
all these. 

This approach, though straightforward, has the disadvantage of requiring 
such a large number of calculations that even for moderate size systems 
the problem becomes unmanageable. 

Two steps are taken to obtain a feasible solution. First, states with the 
same number of faults and similar failure characteristics are collected 
into aggregate states, thus reducing the size of the state space. Second, 
detailed information given by the individual states forming an aggregate 
state is replaced by probabilistic statements, thus allowing a decomposi- 
tion between the Coverage Models and the Aggregate Reliability Model. The 
Coverage Models are used to derive the transition rates for the Aggregate 
Reliability Model, which, under certain conditions on holding times of the 
original process, is solved as a non-homogeneous Markov process. 

In Section 3.2 the Aggregate Model will be described in detail, excluding 
only the case of transient faults. As mentioned in J. J. Stiffler, J. S. 
Neumann and L. A. Bryant (1982), these cases showed instabilities. A 
model with transient faults would include backward transitions, e.g., from 
J_ to J_ - l(x) if some transient stage-x fault becomes benign. This step 
is avoided in the present version of CARE III by an approximation based on 
the relative speeds of coverage rates and occurrence of fault 
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rates. As an example consider a transient fault as given in Figure 
3.1-3a, i.e., the fault is either detected or becomes benign with constant 
rates 5 and a, respectively. The probability of being in state D within 
the coverage model is given by 

P D (t) = — ^ - -- 1 - exp(-(a+5 )t) . 

Figure 3. l-3b shows the dynamics of this fault including the occurrence of 
the fault and its possible recurrences. The probability of being in state 
D within the whole system is then given by 

H 0 (t) = f s - expf-s^) - exp(-s 2 fJ.) du, 

•'o 2 1 

where s^ and S 2 are the roots of 

s 2 - ( A+a+5 ) s + A5 = 0 . 


If the parameter 5 is much larger than A, then Hg ( t) can be approximated 
by 



This last integral corresponds to the value of Hp(t) as presently cal- 
culated in CARE III. 


The effect of this approach to handling transient faults on the calcula- 
tion of transition rates and state probabilities in the Aggregate Model is 
not fully understood. This problem and the instability encountered by 
Raytheon in Phase III is not addressed in this report. 
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Figure 3.1-3a Example of Transient Fault 



Figure 3.1-3b Dynamics of Transient Fault 




3.2 REDUCED RELIABILITY MODEL 


3.2.1 Aggregate States 

The reduced state space is formed by aggregation of states with identical 
number of faults per stage and similar failure characteristics. 

As defined in Section 2.3 the possible aggregates for each fault vector 
are: 


If JL_ is in L, 

H( l) : aggregate of spares exhaustion states with J_ faults; 

and if l_ is in L, 

(a{JJ : aggregate of operational states with _t_ faults, 

F( l ) : aggregate of failure states with L_ faults. 

3.2.2 Transitions and Rates 

As discussed in Section 2.3 the transitions and rates in the Aggregate 
Model are: 

• From W{_1) to H(_l_ + ljy)) with rate X (t| l , i+l(y)), if a fault 
occurs on a fault free stage-y module; 

• From F (_i_) no transitions are possible since these states are 
absorbing; 

• From G( l ) to F(_^) with rate n (t|_£_) if a coverage failure 
occurs; 

• From G( l ) to either 

H (J_+l(y)) with rate X (t I X>J_ + I(y))» 

F(^_+l(y)) with rate X^ 2 ^(ti^_,^_+X(y)), or 
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G(J[+l(y)) with rate A ^^(tl_^,J_+l(y)), 

if a fault occurs on a fault free state-y module. 

To determine under which conditions the different transitions out of G( l ) 
occur, it is necessary to disaggregate this state and to analyze the 
dynamics within each of its parts. 

Fix a state (c[,l,c) in Such a state divides latent in-use modules 

into two groups: 

• Interacting modules: those latent in-use modules which form a 

critical pair with another latent in-use module. Such a pair of 
modules will be cal Ted -an interacting pair. 

• Single modules: those latent in-use modules which are not 

interacting. 

The possible transitions out of the state G (J_) are then as follows: 

(1) To F(J_) if either 

(l.a) A single module created an error that escaped undetected 
(i.e., transition from E to F in corresponding SFCM) , or 

(l.b) An interacting pair, with one active module and the other 
benign, and either the active module created an error or the 
benign module became active (i.e., transition from AB to DF 
in corresponding DFCM). 

(2) To H( _/_+l(y)) if both 

(2. a) A fault occurs on a fault free stage-y module, and 

(2.b) is in L. 
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(3) To F( l +l(y)) if the following four conditions hold: 

(3. a) A fault occurs on a fault free stage-y module, 

(3.b) _i_+ljy) is in L, 

(3.c) The new faulty module is in-use, and 

( 3. d ) A latent in-use module, critically paired with the new faulty 
module, and either 

(3.d.l) Is single and non-benign (i.e., active or in 
error), or 

( 3. d. 2) Is interacting and active. 

(4) To G ( l +l(y) ) if the following three conditions hold: 

(4. a) A fault occurs on a fault free stage-y module, 

(4.b) _£_+l(y) is in L, and 

(4.c) Either 

(4.C.1) The new faulty module is not in-use (i.e., it is a 
spare module), or 

(4.C.2) No latent in-use module satisfies condition (3.d). 
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An analysis of the above conditions leads to the following observations: 

• Given that (3. a) holds, condition (4.c) is complementary to the 
set of conditions (3.c) and (3.d), and so 

+ X (2) (tlX»J_ + I(y)) = A (t| j_,j_+l(y)); 

• The transitions from G( i ) to H( s. +l(y)), from H(J_) to 
H( l +l(y)), and from _l_ to j_+ljy) under perfect coverage, depend 
on the vector j_ of faults but not on the coverage status of the 
system. Thus the rate for all three transitions is given by 

A (tlj_,j^+i(y)) = (n(y)- i (y) ) 2 A(t I yj) 

j 

In the present version of CARE-III, the conditions that define transitions 
from G{jJ to F(j_+l(y) ) are replaced by 

(3. a') A fault occurs on a fault free stage-y module, 

(3.b‘ ) J_+l(y) is in L, 

(3.c') The new faulty module is in-use, and 

( 3 .d ' ) A latent in-use and non-benign module is critically paired 
with the new faulty module. 

These new conditions allow for a simpler evaluation of the rates \ 
and lead to a conservative value of the reliability of the system. 

3.2.3 Assumptions on Stochastic Properties 

The original stochastic model is a mixture of time dependent rates for 
occurrence of faults and Semi-Markov process for coverage dynamics. 
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The discussion in Section 2.2 on the comparative speed of coverage 

dynamics with that of occurrence of faults suggests that internal 
transitions within aggregate states occur at much faster rates than those 
between such states. Thus the dynamics within these states can be assumed 
to happen instantaneously and the Aggregate Model is well approximated by 
a non-homogeneous Markov process. 

3.2.4 Model Equations and Solutions 

Assuming that the rates for the Aggregate Model are known (their 

derivation is given in Section 3.4), the results given in the Appendix for 

non-homogeneous Markov processes are used to calculate state probabilities 

and the reliability of the system. 

Let the state probabilities be: 

P ( 1 1 JL) : probability that the system is in state G( l ) at time t; 

Q(tl JJ : probability that the system is in state F(J_) at time t; 

S(tl J) : probability that the system is in state H(_£J at time t. 

These probabilities are conditional on the system being fault free at time 

0 . 


The corresponding forward differential equations are then: 

p(tljj » -p(tlj_) MtiJ.) +Vp(tU-IOO> x (1) (ti l -i(x) 

dt T 

(3.2-1) 

^q(tlj.) = p(t|j.)^(tlj.) +£p(tlj_-i(x)) x ( 2 ) (tU-i(x),J.), 

X 

(3.2-2) 
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with 


A(tlJ_) = n(t\J_) +X)x(t|J_,_£_ + I( x ^ 


— S(t!_i_) = -S(tlJ_) X (t|J_) + 
dt 


+ 52 [p(tlj_-l(x)) + S(tlJ_-l(x)) A*(tlJ_-l(x),X) 


(3.2-3) 


x 

with 


A (tlj_) = A(tlJ_) - //(tlj_). 


The equivalent forward integral equations are: 


P(tlJ_) = 5] f P(ulJ_-l(x)) A ( 1 ) (u|J_-l(x),J_) ex P [-A(u,t|JJ] du > 

X 'X) 


(3.2-4) 


Q(t|J_) = J j^P(ul JJ /i (til JJ +Xl p ( u IA-I(x)) X { 2 ) (u|i_-I(x),J.)] du. 


(3.2-5) 


and 


S(t|_i_)= f 5] [ p ( u 1 X-K x ) ) + s ( u ! 1 ( x ) )] A* ( u I J_- 1 ( x ) , _/.) expj^-A (u,t|J_)Jdu, 

(3.2-6) 

* * 
with A and A denoting the definite integrals of A ( . I _/_) and A (.1 JJ 

respectively. 
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The reliability of the system is then 


R(t) = £ P(tlJ!_), (3.2-7) 

l_ in L 

or equivalently the unreliability of the system is 

1 - R(t) = Q(t|_i) + S(tlJ_) (3.2-8) 

l in L i in L 
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3.3 CARE III SOLUTION 


3.3.1 Perfect Coverage Model 

In Section 3.2 it was seen that the probabilities P(tl^_) needed to cal- 
culate the reliability of the system are obtained from a system of ordin- 
ary differential equations. These can be solved recursively by successive 
increments in the vector l . 

Since, as has been repeatedly observed in this discussion, the systems of 

concern here are highly reliable, A^ (t I i , i+l(y) ) must in general be 

( 2 ) 

much larger than A (t|f,i+l(y)) and A (t I i ) must be large compared 

( 1 ) * 

to fi ( 1 1 £) • Thus A (t |J_, i +l(y) ) is close to A (t I / , l+l(y)) and 

■k 

A (tl_i_) is close to A (tl JJ. So equation (3.2-1) can be replaced by 

^ P (tl _£_) = -P (tlj_) X (t l_/_) +^P (tlj_-l(x)) A (tlj_-l(x),_i). 

x 

(3.3-1) 


* 

As mentioned in 3.2, the rates A correspond to the perfect coverage case 
and so P (tl JJ given in (3.3-1) represents the probability of l_ faults 
at time t given perfect coverage. 

Under perfect coverage, the interactions between stages are not relevant 
and so the fault status of stages are independent from each other. It 
follows that 


n(x) \ Z(x) ■ n(x)- i(x) 

P (tl JJ = Ilf j [l-r(tlx)] [r(tlx)J 
x \ £ ( x) / 


(3.3-2) 
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where 


r(tlx) = exp I" - f XA(u|x.) du ] = (3.3-3) 

L J o i 

= reliability of a stage-x module. 

Formula (3.3-2) can also be obtained directly by solving equation (3.3-1). 
3.3.2 Approximate Reliability 

The approach taken in CARE III is to calculate Q(tl J f_) and S(t|^£_) by 
using P (t| J_) instead of P(tl_i_) in equations (3.2-2) and (3.2-3). The 
result of such approximation is to assume that the system has been 
operating under perfect coverage up to time t. This is represented in 

Figures 3.3-la and b. Analytically it follows that P (t| j ) is larger 
than P(t|J_) and so the approximate values of Q(t|_/J and S(tl-i_) are 
larger than those obtained from equations (3.2-2) and (3.2-3). Hence, a 
conservative value of the reliability is obtained. 

The new equation for S(t|J_) can be shown to be equivalent to equation 
(3.3-1) and so S(t|J!_) is approximated by P (tl_/_). 

The CARE III approach can be summarized by the following steps: 

* 

• calculate P (t|_i_) using equation (3.3-2) 

• calculate Q(t I J_) using 

Q(t|J.) = f [p*(u|_£_)/«(tUJ X (2 ) (uI_/-1(x),_L)] du 

•'o X 

(3.3-4) 
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calculate the unreliability of the system by 


l-R(t) 


^ Q(tljL) 
l in L 


+ ^ P*(tlJ_) 
l' in L 


( 3 . 


Perfect 


Perfect 

1 

►G(i') ► G(l) 



Figure 3.3-la Approximations of State Probabilities Q(t|i) 


Assumed Perfect for 
Transition to H-state 


I 

~ G (i) t— ► H(i) 


Figure 3.3-lb Approximations of State Probabilities S(t|i_) 
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3.4 TRANSITION RATES 


3.4.1 Approximations Used 

In the original detailed description of the system it is known at each 
time which modules have experienced a fault, the category that caused the 
fault and the coverage status (active, benign, ...). Furthermore, it is 
known which of the faulty modules form critical pairs. 

In the reduced model this level of detail has been lost. The rates of 

interest will then be obtained by first calculating the corresponding con- 
ditional rates given some detailed faulty structure and then integrating 
with respect to the probability distribution of such structure, i.e., 
r(t) = E r(t) | hj. J , where r(t) is the rate of interest and h^. is the 
detailed history of the process to time t. 

In the rest of this section Y t denotes the state of the system at time t. 
So Y t = A denotes the occurrence of event A at time t, and P(Y t = A) the 
corresponding probability. 

Two basic properties are used to derive the aggregate rates: 

PI If A and B are any two states, where A is the aggregate of 

simpler states A^ , and if r(t), r^t) denote the rates corres- 
ponding to transitions from state A, A^, respectively, to state 
B, then 

r(t) < E r,(t) p[v t - A, |y t . a], 
with equality if the states A^ are disjoint. 

P2 Let Tj, Tg, ...» T n be independent and competing transition 

times that occur with rates r-(t). Then the transition rate 

r(t) of the smallest transition time is given by the sum of the 



In the derivation of the formulas for the aggregate rates, property PI can 
be used by disaggregating the G(J_) state into its (d,l,c) components and 
considering all the possible choices of in-use (non-spare) modules. Those 
that are possible are not equally likely. Some choices would imply past 
failure of the system, e.g., if (d.,i.,c) determines a critical pair of 
latent, in-use modules whose DFCM state is DF, this implies past system 
failure. To consider only the truly possible choices of in-use modules, 
given that the system is operational, and how each of these affects the 
aggregate rates, would entail an analysis of the detailed past history of 
the process. Such an approach defeats the purpose of the aggregation and 
decomposition steps taken to decrease the size of the state space. 

The following assumption is thus made: 

(Al) Given faults in the system, all choice of spare, in-use 
groups of modules are possible, independently of the failure 
status of the system. 

This assumption implies that all states (d.,i,c) with the same number of 
latent modules, and the same number of latent, in-use modules, are equally 
likely. Hence, all of these contribute in the same way to the aggregate 
rates. Since this procedure includes more critical pairs than are truly 
possible, over estimates of the aggregate rates are then obtained. 

Condition Al is based on the conservative assumption that the system has 
been operating under perfect coverage until the present. Furthermore, Al 
is consistent both with the approach taken in Section 3.3 to calculate the 
failure state probabilities, Q( t | _£_) and S( t | J^_) , and with the 
discussion given in J.J. Stiffler, L. A. Bryant and L. Guccione (1979, pp. 
32-34). 
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3.4.2 Calculation of X (2) ( t) 


Let A^(t | l , 1 + Uy))> or simply A^(t), denote the rate of a tran- 
sition at time t from state G (_/_) to the failure state F(^£_ + Hy)). 
Since G(J_) is the aggregate of states of the form (dl,c), using property 
PI in Section 3.4.1, it follows that 

X (2) (t) = S x (2) (t | d,c) P [T t = (d,c) | Y t = G(J.)] , 

where A^(t|d,c) denotes the rate of a transition from (ci,£) to 
¥{_£_+ Uy)). Such a transition occurs when the first fault free, in-use 
stage-y module, say (y,b), suffers the first possible fault category, say 
y.; using property P2 it follows that 

J 


X (2) (t) 


S S X(t 1 y.) P 
(d,c) b,j J 


Y t = (d,c), and (y,b) 

Y t = G(J.) 

is fault-free, in-use 



As described in Section 3.2.2., a transition to failure state occurs if 
there is a non-ben ign, in-use module that is critically paired with the 
module (y,b). 


Again using property PI, by choosing all possible in-use faulty modules 
(x,a) that form a lethal combination with (y,b) and all possible vectors 
U of latent modules, gives: 

X (2) (t) < iL EEE A(t |y.) P |"Y t = ( A ,J_) |Y. = G(i)l 
H (d,c) b,j x,a 31 1 J 

P [Y t = (d,c), and C(a,b) | Y t = 


where C(a,b) denotes the occurrence of the event: module (x,a) is non- 
benign, module (y,b) is fault-free, and both form an in-use critical pair. 


Assumption A1 implies that the status of stages are mutually independent, 
and depend on the number of faults i, but not on the failure status of 
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the system. Furthermore, /X(x) and n(y) - l (y) are upper bounds for the 
numbers of latent, in-use stage-x modules, and of fault-free, in-use, 
stage-y modules, respectively. It follows then that: 


Xl2)(t, V§y> 5 [ n(y) ' i(y) ] w! 

• b l 5 y p [ Y t = ( m( x )>m( y)) I Y t = ( i ( x )> £(y))] 

where 


Hg(t | x) 


probability that a given stage-x module has a 
non-benign fault at time t; 


H L (t | x) 


probability that a given stage-x module has a 
latent fault at time t; 



: probability that a given pair of (x,y) modules 

is critical when chosen from existing latent in- 
use stage-x modules, and fault-free in-use 
stage-y modules, given }L_ latent and faulty 
modules. 


The H functions are evaluated by conditioning on the time of occurrence of 

the x. fault. The b^ function is evaluated by conditioning on the 
I x,y 

number of latent in-use modules in stages x and y. 


In summary, the rate for a transition from G (_£_) to F(_i_ + l_(y)) is given 
by 

+ i(y)> - 

[? X(t |yj)j (n(y) - £(y)> c(t | y,JJ, (3.4-1) 


58 



with 


c(t | y,JJ = 2 — !!i— 1^2 — D(t,(x,y)| JP), (3.4-2) 

x H L (t|x) 

D(t,(x,y)|_f_) is defined by (3.4-3) 

5- Ji( x ) b ! l(iL>X) P r^(x) ,t I f(x)l P [/i(y),t | ! (y)l if x t y 

Ai(x)>/i(y) ,y L J L J 


S /i(y)b (iL.X) p [z^y).* U(y) 
/i(y) j,jr L 


if X = y, 


Hg(t|x) 


H §(t| X-f ) 
Hj_( 1 1 x) 
H L (t|x.) 


= SHg(t|x.), 


= J X(u I x i ) r(u I x) Pg(t - u I X.) du. 


= ZH L (t|x.), 


■ j X(u I x.) r(u I x) P L (t - u J x i ) du 


(3.4-4) 


(3.4-5) 


(3.4-6) 


(3.4-7) 


where Pg and P L are both obtained from the Single Fault Coverage Model for 
fault category x. t 


P(/«(x),tU(x)) 


a(t|x) 




IW-uM 

9 


(3.4-8) 


(3.4-9) 
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• 2 N (q(x), q(y) | j) [l+S (x,y) 8 (j,l)] 

j=l 

. /q(x)-j- S(x,y\ 

\ 1- -j / 

N |q(x),q(y) | jj = number of sets of j critical pairs that couple a 

fixed y module, among first q(y) modules, with j 
distinct stage-x modules, among first q(x), 

(3.4-12) 

q(x) = q(i,x) if NOP(x) = (q(l,x), q(2,x),...) and 

q(i-l,x) > n(x) - i (x) + m(x) > q(i,x). 

(3.4-13) 

(Note: b^ ( P , i ) is not implemented in CARE III, Version 3. 

x,y 

3.4.3 Calculation of fi{ t) 

Let (t | j_) denote the rate of a transition at time t from state G( l ) 
to state F( 1 ) . As described in Section 3.2.2, two types of changes con- 
tribute to such a transition: 
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(i) a single latent in-use module in error state E propagates with- 
out detection (i.e., transition into state F in the correspond- 
ing SFCM); or 

(ii) a latent in-use critical pair in state AB moves to DF in the 
corresponding DFCM. 

1 I 1 I 

Let a (t | l_) and A (t J JL) denote respectively the ensemble rates due to 
each of the two types of changes. It follows then that their sum is an 
upper bound for the rate jLt(t | i ). 

I 

Using properties PI and P2 in Section 3.4.1, it follows that a (t | _£_) is 
given by the expression 

X) X! Xa 1 (t | x,a) p|y t = d ; (x,a) in-use latent j Y t = G(J_) , 

I 

where a (t | (x,a)) is the rate of a transition G( i ) to F( l ) due to error 
propagation in module (x,a). 

By conditioning on the category that caused the fault on module (x,a), and 
on the time of occurrence of the fault one obtains that: 

a ' (t j x,a) = 2 h p (t J x i )y/H L (t | x). 


where H^(t | x) is as defined in Section 3.4.2, and hp(t|x 1 ) is the rate 
of error propagation failure due to a category x^ fault. 


Since 


ED 


d • a 


Y t = ( x » a ) single, in-use, latent J Y t = G ( i ) 


< E 


number of stage-x latent modules J Y t = G( i ) 


= £(x)a(t | x). 
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where a(t | x) is defined in Section 3.4.2. 


It follows that 


a (t | i )< 


v-' ' hp (t I x . ) 

X X «x) — F - — 

x 1 1 - r(t | x) 


where hp(t | x. ) is evaluated by conditioning on the time of occurrence of 
the x. fault. 

Similar steps are taken to evaluate A’(t|J_). First consider all d^ 
states that form the aggregate state G(J_). Then for each state d 
consider all latent critical pairs of faults (x. ,yj) that can lead to 
failure of the system. Finally condition on both the time of occurrence 
of first fault, and on time of occurrence of the second given that at that 
instant the first fault is benign. These steps lead to the following 
mathematical expression: 


A* (t | l ) = 12 hpF ^ * Xl — 4 - B(t,(x,y)| _i_), 

~ x,i y,j H L (t | x) H L (t | y) 

where 

H L (t | x) : as defined in Section 3.4.2., 

h DF (t | x i ,y J .) : rate at which an (x-.y^) critical pair causes 

system failure, 

B(t,(x,y)| J_) : expected number of times a given (x,y) pair is 

latent, in-use and critical at time t, given J_ 
faults. 
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As was the case for \(2)(t), the evaluation of B(t, (x,y) | j_) requires 
detailed analysis of the state space. Use of assumption A1 then leads to 
the estimate: 

s(t, (x,y) | ±) = S b jf ) m(x) A*(y) PrY,=(M( x ),/x(y))| Y t=(i( x ), i(y)) 
ju(x) x »y — i 1 x 

M(y) 

where 

M , i ) : probability that a given (x,y) pair is critical 

when chosen from existing latent, in-use stage-x 
and stage-y modules, given ji_ latent and j_ faulty 
modules. 

The functions hp, Hg and are evaluated by conditioning on the time of 

occurrence of the x- fault. The function hgp is evaluated by conditioning 

on the time of occurrence of the y. fault given that the first fault is 

fll J 

benign. The function b' ' is obtained by conditioning on the number of 

* *y 

latent in-use modules in stages x and y. 


In summary, the rate for a transition from G(! ) into F(X ) is given by 


M(t |i) = a (t | _£_) + A (t | J_), 


(3.4-14) 


where 


E hp(t | x.) 

_ I (x) ~ — !— 1— , 

x,i l-r(t I x) 


(3.4-15) 


, v* ^ h Dp (t | x.,y.) 

A (t i) = JL, 1 - L 


x,i y,j H L (t | x) H L (t | y) 


B(t, (x,y) | X). (3.4-16) 
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(3.4-15) 


h F ( t I x i ) = I \(u | x.) r(u | x.) p F (t-u | x..)du. 


r(t | x) 

= exp - j 1C X ( u | x - ) du , 

h DF<‘ 1 Vj> 

= |H b (u | x.)\(u | yj ) r(u | y) P DF (t-u | x-y^du, 
Jo 

H B (t| X,) 

= f X(u | x.) r(u | x) Pg(t-u | x.) du, 

Jo 

H L (t | x) 

= 2 J X(u ] x.) r(u | x) P L (t-u.|x.) du. 


B(t,(x,y)| j!_) as defined by 

S b fju, £ ) m(x) y) p ( /*(*)»* |iW) p( /*(y)»t I * (y) 

/x(x) X ’ y 

^(y) 

2 b* 1 ' (/■.,! ) /t(y) (**(y) - 1) Pt/*(y).t|i(y» 

M(y) y ’ y 


P( H (x) ,t | £{x) 



a(t | x) = H L (t x)/l - r(t | x). 


(3.4-16) 

(3.4-17) 

(3.4-18) 

(3.4-19) 

(3.4-20) 

if x t y 

if x = y, 

(3.4-21) 

(3.4-22) 
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m 

b Y w( M - £ ) is defined by 

* »y 


(3.4-22) 


N(q(x),q(y)| 1) 


q(x)q(y) 


2N(q(y)\q(y)|l) 


q(y) (q(y)-i) 


N(q(x) ,q(y) | 1) ) 


q(x) 


(Note: b^(u, | ) is 

x,y 


/ n(x)- £(x)\ 

[ q(x) ) 

/n(x)- i(x) + /i(x)\ 

\ Q(x) j 


/n(y)- i(y)\ 

\ q(y) j 

/n(y)- £ (y)+/x(yj] 

\ q(y) i 


1 


( n(y u (y) ) W n( fc>r) 

/n(y)- i(y)+/i(y)\ 
l q(y) 


if x = y. 


= number of (x,y) critical pairs among first q(x) 
and q(y) modules, (3.4-23) 


= q(i,x) if NOP(x) = (q(l,x), q(2,x), ) 

and q(i -l,x) > n(x) - £ (x) + m (x) > q(i,x). 

(3.4-24) 


not implemented in CARE III, Version 3). 


65 



3.5 IMPLEMENTATION IN CARE-III 


In the previous sections the formulation of the reliability model was re- 
viewed. The objective of this section is to document the model that is 
actually implemented and outline the calculations performed. In the 
following sections, the overall structure and data flow of the CARE3 
program are described in some detail, the solution of the reliability 
model is outlined and the basic reliability functions are defined. 

3.5.1 Overview of CARE3 Program 

Figure 3.5-1 illustrates the overall data flow for the CARE3 program. The 
user's input data for the reliability model is read from file CREIN by the 
input program CAREIN. It includes stage data (parameters N, M, NOP, LC), 
fault category data for each stage (parameters type, CO , \ ) , the system 
fault tree and any critical pairs fault trees. After the data is checked 
and preprocessed by CAREIN, it is passed to CARE3 on files RELIN, FT15F 
and BXYIN. The moments of the coverage functions are also passed to CARE3 
on file CVGMTS. The reliability model is solved by CARE3 and the 
functions and Q<.^ are computed. In addition the reliability 

functions are passed to the plotting program, RELPLT, on file PLTFL. 

Figure 3.5-2 provides a high level functional description of the CARE3 
program; a brief explanation of the functions follows: 

• Computation Control 


These subroutines control the computations for solving the reli- 
ability model; the details of the computational sequence are given 
in Section 3.5.2 to 3.5.5. Figures 3.5-3, -4 and -5 illustrate 
the control structure of subroutines CARE3, RLSBRN, NFLTVDP and 
GNFLTVC with call trees. 
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Reliability Functions 




The subroutines in this group compute the basic reliability 
functions used in the solution of the reliability model; 
definitions of the functions are given in Section 3.5.5. 

o Numerical Integration 

The subroutines in this group are used to compute numerically the 
integral (over a time interval) of a function. This kind of cal- 
culation is required in the solution of the integral equation for 
the functions Q(t|J_). The numerical methods used in these sub- 
routines are discussed in detail in Section 3.6. 

• Numerical Convolution 


The subroutines in this group are used to compute numerically the 
convolution of two functions. This kind of calculation is re- 
quired to compute the (time dependent) transition rates of the re- 
liability model for the situation of non-perfect coverage. These 
are the most crucial numerical subroutines because they are part 
of the interface between the coverage and reliability models; the 
numerical methods used are discussed in detail in Section 3.6. 

• Support Functions 

These are "library" type subroutines which are used by all other 
subroutines in the program for very basic operations or calcula- 
tions. 


67 




COVRGE 









CAXLAT 

FINTGRT 

FHSFST 

ARZERO 

CRXFF 


FFSFST 

FLAM 

FRIXIFF 


FHDFST 

FHCK 

FGST 


FFDFST 

BUFBLK 

FPSTAR 


ABCST 

BUFFOUT 

FPSTREC 



BUFFIN 

UNRELQ 



PRMTGH 

SUMHAT 



BXYC 

FAC 



FPMUX 

FAPC 



FCLAM 


FCYJ 

FBCRTL 

FDSCRTL 


FIGURE 3.5-2 Functional Structure of CARE3 Program 








CARE3 


FIGURE 3. 


— RLSBRN 

— NFLTVDP 
GNFLTVC 

— FNCK 

— BUFBLK 


-3 CARE3 Call Tree 
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Figure 


NFLTVDP 


-CAXLAT 

L 


-CRXFF 


FHSFST 

FRXIFF 
— FCLAM 
— FHSFST 
— PREEXP 


FGST 


I — FHDFST 

L-abcst 
Lffdfst 

L 


ffsfst 


[-FHSFST 
L-abcst 

L 


FFSFST 


-FUCK 
— PRHTfiH 
— BUFBLK 
-BUFFOUT 


.5-4 NFLTVDP Call Tree 
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GNFLTVC 


— ARZERO 
— FPSTAR 


(JNRELQ 

— FINTGRT 
— SUMHAT 

l— BUFFIN 


1 _ 

L FBCRTL 
— BXYC 
— FPMUX 


— FAFC 
— FCYJ 



FDSCRTL 


— BXYC 
l— FPMUX 


h-FLAM 


— FPSTAR 

Lfpstrec 


FIGURE 3.5-5 GNFLTVC Call Tree 
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3.5.2 System Fault Tree 


The preprocessing of the system fault tree is performed by the input pro- 
gram CAREIN. Figure 3.5-6 illustrates the user's input data for the 
system fault tree (on file CREIN) and the corresponding input for FTREE 
generated by CAREIN. (Refer to the CARE-III User's Manual for a 

description of the input data formats.) A brief description of each of 
the parameters in Figure 3.5-6 follows. 

• TITLE : One or more lines of descriptive text. 

• IROP : FTREE run option; set to 3 by CAREIN. 

• MCOMB: The maximum number of input gate (stage) failures; set 

to the maximum of 4 and KWT by CAREIN, where KWT is de- 
scribed in the text below. 

• PSTRNC : Perfect coverage truncation value; set to 10“^ by 

CAREIN. 


• IFSTG : The number of the first input gate (stage); it must be 

set to 1 by the user. 

• ILSTG : The number of the last input gate (stage); it must be 

set to NSTGES by the user. 

• IFGTE : The number of the first logic gate in the system fault 

tree. 


9 ILGTE : The number of the last logic gate in the system fault 

tree. 

• TTHRS : The flight time (in hours); computed by CAREIN from the 

input variable FT in NAMELIST/RNTIME/. 
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• ISTG : The number of an input gate (stage); CAREIN generates an 

input gate for each stage, i.e., ISTG ranges from 1 to 
NSTGES. 

• EFCTLM : The failure rate of an input gate (stage); the CAREIN 

estimate of the failure rate for a stage is described in 
the text below. 

t Il_: FTREE control code set to 1 by CAREIN: this forces FTREE 

to consider EFCTLM the failure rate and FTHRS the time 
used to compute the probability of failure for the input 
gate (stage). 

• IGTE : The number of a logic gate in the system fault tree. 

• ITYP : The type of a logic gate; refer to the CARE-III User's 

Manual for a description of the logic gate types. 

• INPUTS : A string of gate numbers listing the inputs to the logic 

gate. 

Figure 3.5-7 illustrates the output data (MINTRM) file generated on unit 
FT15F when the system fault tree is processed by FTREE. A brief descrip- 
tion of each of the parameters in Figure 3.5-7 follows. 

• PRBMT : The FTREE estimate of the probability of failure of the 

system (after a flight time of FTHR) due to the set of 
stage failures indicated by the corresponding MINTRM. 

• MINTRM : A fault vector (1 = failed, 0 = not-failed) for a 

failed state of the system; each fault vector has NSTGES 
components (i.e., one for each stage). 
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DATA TYPE 

USER'S INPUT FILE(CREIN) 

FTREE INPUT FILE(FTllF) 


TITLE 

TITLE 

TITLE AND 

* 

IROP.MCOMB.PSTRNC 


IFSTG, ILSTG, IFGTE , ILGTE 

IFSTG, ILSTG, IFGTEJLGTE 

CONTROL BLOCK 

** 

** 


FTHRS FTHRS 



* Record generated by CAREIN 

** Record not used by CARE3 version of FTREE 


FIGURE 3.5-6 System Fault Tree: FTREE Input 
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PRBMT, MINTRM 
PRBMT, MINTRM 
PRBMT, MINTRM 


FIGURE 3.5-7 System Fault Tree: FTREE Output 
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3.5.3 Critical Pairs Fault Tree(s) 

The preprocessing of the critical pairs fault tree(s) is performed by the 
input program CAREIN. Figure 3.5-8 illustrates the user's input data for 
a critical pairs fault tree (on file CREIN) and the corresponding input 
for FTREE generated by CAREIN. (Refer to the CARE-III User's Manual for a 
description of the input data formats.) A brief description of each of 
the parameters in Figure 3.5-8 follows: 

• TITLE : One or more lines of descriptive text. 

• IROP : FTREE run option; set to 3 by CAREIN. 

t MCOMB: The maximum number of input gate (unit) failures; set to 

2 by CAREIN. 

• PSTRNC : Perfect coverage truncation value; set to 10" 14 by 

CAREIN. 

• IFUNT: The number of the first input gate (unit); it must be 

set by user to the number of the first unit in the first 
stage of the tree. 

• HUNT : The number of the last input gate (unit); it must be set 

by user to the number of the last unit in the last stage 
of the tree. 

• IFGTE : The number of the first logic gate in the critical pairs 

fault tree. 

• IIGTE : The number of the last logic gate in the critical pairs 

fault tree. 
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• FTHRS: 


• ISTG: 


• ISFUNT: 


• ISLUNT: 


• IUNT: 


• UNTLM: 


• Tl: 


• I GTE: 


• ITYP: 


• INPUTS: 


The flight time {in hours); computed by CAREIN from the 
input variable FT in NAMELIST/RNTIME/. 

The number of a stage in the critical pairs tree; CAREIN 
generates ( ISLUNT-ISFUNT + 1) FTREE input gates (mod- 
ules), for each stage. 

The number of the first input gate (module) in the stage 
numbered ISTG. 

The number of the last input gate (module) in the stage 
numbered ISTG. 

The number of an input gate (module) in the stage 
numbered ISTG: ISFUNT IUNT ISLUNT. 

The failure of an input gate (module) in the stage 
numbered ISTG; the CAREIN estimate of the failure rate 
for the units in a stage is described in the text below. 

FTREE control code set to 1 by CAREIN: this forces FTREE 
to consider UNTLM the failure rate and FTHRS the time 
used to compute the probability of failure of the input 
gate (module). 

The number of a logic gate in the critical pairs fault 
tree. 

The type of a logic gate; refer to the CARE-III User's 
Manual for a description of the logic gate types. 

A string of gate numbers listing the inputs to the logic 
gate. 
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Figure 3.5-9 illustrates the output data (MINTRM) file generated on unit 
FT25F when a critical pairs fault tree is processed by FTREE. Such a set 
of records is written to unit FT25F for each of the critical pairs trees 
input by the user. A brief description of each of the parameters in 
Figure 3.5-9 follows. 

• NUNTS : The total number of modules in the stages in the crit- 

ical pairs trees (ILUNT-IFUNT + 1) 

• PRBMT: The FTREE estimate of the probability of a critical pair 

system failure due to the set of unit failures indicated 
by the corresponding MINTRM. 

• MINTRM ; A fault vector (1 * failed, 0 = not-f ailed) for a 

failed state of the system; each fault vector has NUNTS 
components (i.e., one for each unit). 

In Section 3.4, it was shown that the critical pairs fault tree data is 
used in the calculation of the rate for transitions between the Q( l ) and 
F (_/_) or F( i +l(y) ) states. In particular the factors: 

( 1 ) 

b (iL»_L) » 
x,y 


( 2 ) 

b (iL»_L) » 

x,y 

depend on the number of (x,y) critical pairs, N(x,y). However, in the 
CARE-III program, the corresponding calculations involve the factor; 



which is defined verbally on page 6 of the CARE- III Maintenance Manual, 
L. A. Bryant, and J. J. Stiff ler (1982a), but is not defined by an equa- 
tion in any CARE-III document. Review of subroutine CRTLPRS indicates the 
following definition: 


b xy ( l (x)-//(x), i(y)-A(y)) 


k xy ( / (x)-/u(x), l(y)-A(y)) 
(n(x)- l (x)+A(x))(n(y)- /(y)+A(y)) 

k xv ( l (x)-A(x), l (x)-A(x)) 


(n(x)- / (x)+A(x))(n(x)- l (x)+A(x)-l) 


x^y. 


x=y. 


where k xy ( l (x) - n (x), l (y) - A (y)) is the number of (x,y) critical 

pairs (x,a), (y,b) such that there are at least / (x) - a (x) modules 

(x,a‘) and at least l{ y) - fi (y) modules (y , b ' ) for which 


a < a', 
b < b'. 


Note that, since the counts k depend on the numbers assigned to the 

xy 

modules, the b and hence the solution of the reliability model will 
xy 

depend on the numbering of the modules. This situation is inconsistent 

with the assumption that all modules within a stage are identical. The 

calculation of k is further complicated by logic which depends on the 
xy 

(user supplied) NOP data; BCS has not been able to make a reasonable 

interpretation of this logic. The (user supplied) LC data is available, 

but not used, in the k calcualation. 

xy 
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USER'S INPUT FILE(CREIN) 


FTREE INPUT FILE(FTllF) 


TITLE AND 



IROP.MCOMB.PSTRNC 


IFUNT, HUNT, IFGTE, ILGTE 


CONTROL BLOCK ** 


IFUNT. ILUNT, IFGTE, ILGTE 


FTHRS 


FTHRS 




IGTE.ITYP, INPUTS TO GATE 


IGTE.ITYP, INPUTS TO GATE 



* Record generated by CAREIN 
** Record not used by CARE3 version of FTREE 
+ Not a standard FTREE input block record; 

(ISLUNT-ISFUNT+1)FTREE input block records generated per second 


FIGURE 3.5-8 Critical Pairs Fault Tree: FTREE Input 
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* 


* 


* 


FIGURE 


0 . 


NUNTS 


PRBMT.MINTRM 

PRBMT,HINTRM 

PRBMT.MINTRM 


Record Generated by CREIN 


3.5-9 Critical Pairs Fault Tree: FTREE Output 
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3.5.4 Outline of Calculations 


The evaluation of the system reliability R(t) is performed by the main 
program CARE3; the calculation is partitioned into "SUBRUN's" which 
consist of the evaluation of the reliability of subsystems which are 
independent in the sense that modules in different subsystems are not 
critically coupled as defined by the critical pairs tree(s). (The 
possible coupling of subsystems by the system fault tree does not appear 
to be considered.) For each SUBRUN, the calculation of reliability is 
controlled by RLSBRN which calls NFLTVDP and GNFLTVC to generate all fault 
vectors _i_ for the subsystem, partition the fault vectors into two 
disjoint subsets l_ s and and compute Q(t | _£_) for _£_e L $ and P*(t | J_) 
for j_ e U $ . The calculations performed by NFLTVDP and GNFLTVC are 
outlined in Tables 3.5-1 and 2, respectively, and all basic reliability 
functions used in the calculations are defined in Section 3.5.5. 

The calculation of the system unreliability as a function of SUBRUN 
results is implemented in GNFLTVC and CARE3 and depends on whether or not 
the user supplies a system fault tree. BCS has carefully reviewed the 
program logic and determined that the following equations define the 
calculation of system unreliability that is actually implemented in CARE- 
III. 

• No System Fault Tree 

In this case, the system unreliability is computed by GNFLTVC as follows: 

l.-R(t) = 2 

SUBRUN's 

where 

Q(t|'_l_) = p K(t| i_) dr, 3.5-2 


11 , Q(t|j_)+£ P*(t|_e_) 


JL e L$ 


i fr s 


3.5-1 
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p*(t | £ ) = 


TT 


x€ SUBRUN \i(x) 


n(x)\ n(x)- i(x) /(x) 

(r(t 1 x) ) (l.-r(t|x)) 


3.5-3 


Under the assumption that the default system tree is an OR tree (spares 
exhaustion of any stage fails the system), equation 3.5-1 reduces to 
equation 3.3-5 for the case of one SUBRUN. For the case of more than one 
SUBRUN, the interpel ation of equation 3.5-1 and its relation to equation 
3.3-5 is no longer clear. 

• System Fault Tree 


In this case, the system unreliability is computed by GNFLTVC and CARE3 as 
follows: 


l.-R(t) = Y 

Y Q(t|i) 

♦ E 

TT- 

P*(t|x) :min(x)=l 

l.-P*(t| x) :min(x)=0 


SUBRUN's 

l€l 

MINTRM's 

X 




— s 



t d 

- 


3.5.4 


where 

^ n(x) / n(x) \ n(x)-i(x) £{x) 

P*(t I x ) = 2-j (r(t|x)) (l.-r(t| x) ) 

i(x)=n(x)-m(x)+l \Jt(x) ) 

3.5.5 

and the MINTRM's in the second term of equation 3.5-4 are defined by FTREE 
from the system fault tree. The second term in equation 3.5-4 appears to 
be a correct interpretation of the FTREE output MINTRM file and a good 
approximation to the second term in equation 3.3-5. However, for the 
first term, the relation between L $ for a SUBRUN and the system fault tree 
(i.e., L in equation 3.3-5) is not clear and appears to be in error even 
for the case of one SUBRUN. For the case of more than one SUBRUN, the 
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interpretation of the first term in equation 3.5-1 and its relation to the 
first term in equation 3.3-5 is no longer clear. 

BCS is currently investigating the questions raised by these observations. 
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Table 3.5-1 

RELIABILITY CALCULATIONS (Non- independent) 


FUNCTION 

DESCRIPTION 

SUBROUTINE 

ARRAYS 

PARAMETERS 

X(t|x i ) 

Rate of occurrence of category -x^ faults 
at time t. 

FLAM 

- 

A (x^ 
fc>(x.j) 

A(t|x i ) 

Cumulative rate of occurrence of category 
-x.. faults. 

FCLAM 

- 

A(t|x.) 

r(t|x i ) 

Probability that a stage -x module has 
not experienced at category -x. fault by 
time t. 

FRXIFF 

- 

A(t|x.) 

h DPT^I x 

r(t|x) 

Reliability of a stage -x module at time 
t. 

CRXFF 

RXAR 

r(t|x i ) 

a(t|x.) 

Probability that a stage -x module has a 
latent, non-transient (transient) 
category -x^ fault at time t, given that 
it has (not) experienced a non-transient 
or leaky transient fault by time t. 

CAXLAT 

AXIAR 

H L (t 1 x. ) 
r(t|x) 



Table 3.5-1 (Continued) 
RELIABILITY CALCULATIONS (Non- f -dependent) 


FUNCTION 

DESCRIPTION 

SUBROUTINE 

ARRAYS 

PARAMETERS 

a(t|x) 

Probability that a stage -x module has a 
latent, non-transient fault at time t, 
given that it has experienced some non- 
transient fault at time t. 

FDSCRTL 

FBCRTL 


a(t|x i ) 

h DPT (tl Xi ) 

Rate at which a transient, category -x^ 
fault is detected as permanent at time t. 

FHSFST 

PDP 

a DP’ b DP ,c DP 

0 12 
m DF* m DF’ m DF 

H L (t|x i ) 

Probability of a latent, category -x.. 
fault at time t. 

FHSFST 

PLAT 

(TAPE9) 

a L ,b L ,c L 

h F (t| x i ) 

Rate of error propagation system failure 
due to a category -x^ fault at time t. 

FHSFST 

GORHSF 

3p,bp,Cp 

0 12 
m F ,m F ,m F 

g F (t| x.) 

Rate of error propagation system failure 
due to a category -x^ fault at time t, 
given that it was latent prior to t. 

FGST 

GORHSF 

(TAPE10) 

h p (t|x i ) 
a(t| Xj) 



FUNCTION 

Hg(t|x.) 

&g( 1 1 x ^ ) 

H B (t|x.) 
hop( t I ^ • ) 


Table 3.5-1 (Continued) 
RELIABILITY CALCULATIONS (Non-J_-dependent) 


DESCRIPTION 

Probability of a non-benign, latent 

category -x. fault at time t. 


Probability of a non-benign, latent 
category -x^ fault at time t, given that 
it was latent prior to t. 

Probability of a benign, latent category 
-x.. fault at time t. 


Rate of system failure due to critically 
coupled category -x^ and category -y^ 
faults at time t. 


SUBROUTINE 

ARRAYS 

PARAMETERS 

FHSFST 

PNBNG 

a B ,b F* C B 



B B B 

FGST 

GORHSF 

(TAPE11) 

%(tlx i ) 
a(t|x i ) 

FHSFST 

PBNG 

a B ,b B’ C B 



M B ,M B’ M B 

FHDFST 

HDFPTS 

(TAPE12) 

a DF ,b DF’ C DF 
m 0 1 2 
m DF’ m DF’ m DF 



Table 3.5-2 



RELIABILITY CALCULATIONS 

( j -dependent) 



FUNCTION 

DESCRIPTION 

SUBROUTINE 

ARRAYS 

PARAMETERS 

P*(t|J_) 

Probability that a system has 
sustained exactly l_ failures 
by time t. 

FPSTAR 

FPSTREE 

- 

n, L 
r(t| x) 

P(Mx),tli(x)) 

Probability that a system has 
// ( x ) stage -x latent, 

permanent faults given that it 
has /(x) faults. 

FPMUX 


iL.X 

a(tlx) 

K(tl_i) 


SUMMAT 

SUMK 

cftlljj) 
P*(tli_) 
A(t| l) 

a' (t X) 

c(tiXyj) 

Probability of system failure 
due to a category -y^ fault at 
time t, given l_ faults at time 
t. 

FCYJ 

GNBNG 

Gjj(t| x.) 

D(tiXv y j 



Table 3.5-2 (Continued) 
RELIABILITY CALCULATIONS (independent) 


FUNCTION 

DESCRIPTION 

SUBROUTINE 

ARRAYS 

PARAMETERS 

A'(t|J_) 

Rate of system failure due to 
critical fault conditions for 
l_ faults at time t. 

FAC 

HDFPTS 

HLAT 

h DF< t l x 1'*j> 
H L (t | x fl ) 

B(t|X x i»yj) 

a'(t|_f_) 

Rate of system failure due to 
error propagation for _l_ faults 
at time t. 

FAPC 

GFLD 

n ,JL 

gpUlx^ 

a(t|x.) 

D (t|i_»x i ,y J .) 

Expected number of category - 
x.j» -y^ critical faults, given 
t faults at time t that would 
be created as the result of a 
stage -y fault at time t. 

FDSCRTL 

BXYAR 

a(t|x.) 

a(t|x) 

P(A(x),t | f(x)) 
bxy 

B (t|X.,x i ,y J .) 

Expected number of category - 
x j , -y^ critical faults, given 
i faults at time t. 

FBCRTL 

BXYAR 

a(t|x.) 

PMx),t|i(x)) 

b xky 



3.5.5 Basic Reliability Functions 


The basic reliability functions are the time dependent rate and 
probability functions used in the calculation of the reliability for a 
subsystem (i.e., a CARE-III SUBRUN). The subroutines, arrays or I/O units 
used to compute and store the function values were specified in Tables 

3.5-1 and -2. In this section the definition of each function, as 
obtained from the CARE-III code , are presented along with remarks about 
any inconsistencies with the CARE-III documents and/or BCS' analysis of 
the CARE-III reliability model. In the definitions, the functions are 
defined for any time t >0, but in the CARE3' program the functions are 
evaluated only at the discrete time points for which the reliability model 
is solved: 

t. = (j-1) At; j = 1,2, ...,j max. 

• A (tlx,) - «(x,) X(x i )‘ j(x ’ ) t"( x i )- 1 3 . 5 .! 


• A(t|x.) = 



3.5-2 


3.5-3 


The CARE-III documents give different definitions for A(t|x^) and 
A (tlx.); although the documented definitions are consistent with each 
other, the user will not obtain the Weibull failure model he expects. 


• r(t lx. ) 


• r ( t | x) 


( - A(t|Xi) 

«“ h D p T (r|x i )dr 

* TT r(t|x.) 


x i non-transient fault 

3.5-4 


x n - transient fault 
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H L (t I x n - ) 


• a(t|x.) 


J l.-r(tlx) 

1 . 


: x^ non- transient fault 


: x n - transient fault 


3.5-5 


• a(tlx) 


E 


a(t |x.j ) 


3.5-6 


The function a(t|x) is computed as defined in equation 3.5-6 in 
subroutines FDSCRTL and FBCRTL to compute D(t|j^,x.j ,y .) and B(t|£,x-y-) 
and subroutine FPMUX to compute P(fi (x) ,t £(x)). BCS has determined that 
the sum in equation 3.5-6 should be over only non-transient, category -x. 
faults to make a(tlx) consistent with its definition and use in CARE-III. 
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p*(tlj_) 

■ HQ M («u) *■> 


3.5-7 

p ( M(x) ,t| Jjxj) 

■ (*!*’) (l.-a(tlx)) i(x) -' J(x) (,(t|x)) M(X) 


3.5-8 

h DPT^ t 1 x i ^ 

= /" P DP (x|k (x i )) X (t-r | x^dT 


3.5-9 


= a D p(t|x,) ™° p (tlMx,)) + t> Dp (t|x,) m J p (t|k(x,)) + C D p(t| Xl )ra^ p 

(t 1 k ( x. 

3.5-10 

H L (t|x.) 

lr(t-T| x) : x. non-transient fault/ 

= / p L ( r |k(x i A A(t-r|x.)| / d t 

' 11. : x. transient fault J 

3.5-11 


= a L (t|x.)M° (t|k{x.)^ + b L (t | x. )M^ ^t|k(x.)J + c L (t|x.)M^ ^t|k(x.)^ 


3.5-12 



3.5-13 



• 9 F (t|x.) 


10 

•e* 





J P F (r|k(x 1 )^ X(t-r|x,). 


>(t-r|x) : non-transient fault' 


.1. : x^ transient fault 


dr 


; (t|x.)m° (tlkfx,)) * bpltlx,)™^ ^t|k(x.))+ Cp(t|x.)ny (t|k(x,)) 


hp(t|x.) 

a(t|x i ) 


1 . 


l.-r(t|x) 


1 . 


non-transient fault 


x. transient fault 


/ Pi( T | k(x.)^ X(t-r|x.) 


fr(t-T|x) : x^ non- transient fault"! 


1. : x^ transient fault 


> dr 


(t|k(x,)j+ b B (t| Xl )Mj (t|k( Xj )) + c B (t| Xj )H| ( t |k(x,)) 


3.5-14 


3.5-15 


3.5-16 


3.5-17 



Gj^(t| x. ) 




a(t |x. ) 


r \ > 
: x. non- transient fault 


l.-r(t|x) i 


1. : x. transient fault 


3.5-18 


= f p B (r| k(x.) j A (t-r| x.) 


fr(t-T|x) : non-transient fault 1 


1. : x.. transient fault 


d r 


3.5-19 


= a B (t|x.)M° (t|k(x.))+ b B (t|x.)M^ (t|k(x.)j + c B (t|x.)M 



3.5-20 



• h DF (t l x i»yj ) = J o P DF( r lk(x i ),k(y j ))H B (t-r|x i )X(t-r|y j ) 

3.5-21 

CT* 

= a DF (t l X i’ y j )m DF ( t l k ( x i )> k (yj)) + b D F (t l x i’yj)mJ F (t|k(x i )(k(y j .)| 3.5-22 

+ c op C ^ I x -; »yj) m DF | k ( x - ) k ( y^ ) j 

• Kft|j_) = 2 2 C (t|j_-l(y), yj j P* (t|2_-l(y)) (n(y)- £(y)+lj x(t| y<J ) 

+ j^A (t|j_) + a'(tjj_)j P*(t|j_) 


r(t-T |x)r(t-r|y) 
r(t-T|x) 
r(t-r | y) 


x. and y^. are 
non-transient faults 

x. non-transient and 

y. transient faults 

J 

: x.j transient and 
yj non-transient faults 
x. and yj are 
transient faults 


dr 


3.5-23 



The evaluation of K{t |_£_) requires the calculation of P*(t|^_) and P* (t|_|_-l(y)j for y=l,...,N; in 
subroutine SUMAT, N calls are made to subroutine FPSTAR to compute P* (t|_£ -l(y)) and one call is made to 
subroutine FPSTREC to compute P*(t|j^_) from P* (t| j*-l(N)| by the recurrence relation: 


P*(t|D 



3.5-24 


This approach requires N calculations of the P* function, which as equation 3.5-7 shows requires the 
evaluation of a combinatorial term. An improved procedure would be to compute P*(t|_£_) once using 
subroutine FPSTAR and then compute the P* (t[ l -1 (y)) by equation 3.5-24; this approach requires only one 
^3 calculation of a combinatorial term. 




The CARE- III documents give a different definition for C(t|_f_,y.)» i.e., 

J 


• c(t|X,yj) 


■ £ £ ~4 ■Htl-L.’W- 

x 1 H L (t|x i ) 1 J 


BCS is currently investigating this apparent problem. 


• A ' ( 1 1 _£_) 


•SEES T 

x i y j H L (t|x i )H L (t|y j ) 1 J 


• a (t|j_) 


ff(x) 


EE a(t|x.)g F (t|x i ) . 


x 1 


: x. non-transient fault 


3.5-27 


3.5-28 


3.5-29 


ln(x)- f(x) : x.- transient fault 



: x.. non- transient fault 


= 5Z S h F (t|x.) . < 


x 1 


IM 

l.-r(t|x) 


n(x)- |(x) 


x^ transient fault 


J 


3.5-30 


VO 

VO 


BCS has identified a programming error in subroutine FAPC which computes a (t|_£_); this error has been 
corrected on the BCS version of CARE-III and NASA has been informed of the problem. 


D ( t IX» x i»yj) ° 2 b xy |i(x)- fi(x),£(y)- ytf(y)jp |/i(x),t| i(x) jp | ^(y),t|i(y)j- 
M (x ) 

u(y) 


a(t| x.) 

■/x(x) — - , t : non-transient 
a(t j x) 


(n(x)- i(x)ja(t|x.) : 


transient 


3.5-31 


B(t|X»x i ,y J .) = 2 b xy (i(x)- ^(x),£(y)- ^/(y))p (/i(x),t|i(x))p(/i(y),t|i(y))c(x i ,y J .)a(t|x i )a(t|y J .) 

A ( x ) 

ti(y) 


3.5-32 



where 



a(t|x)a(t|x) 

(n ( x ) - i(x))(j/(y)- /( y)) 
(n(x)- /(x)) ( n (x) - i (x)-l) 



a(t|y) 


: x^y, x. and y^ non-transient faults 

•' x=y, x. and y. non-transient faults 

1 J 

: x^y, x. and y. transient faults 3.5-33 

: x=y, x i and y^ transient faults 

: x. non- transient and y^ transient faults 

: x. transient and non- transient faults 



3.6 COMPUTATIONAL METHODS 


In the previous section the implementation of the reliability model in the 
CARE3 program was outlined; now, the computational methods used to solve 
the model equations will be reviewed. The objective of this section is to 
document the numerical procedures that are implemented and present the BCS 
evaluation of these algorithms in the CARE-III environment. In the fol- 
lowing sections the numerical procedures are first highlighted and then 
discussed in detail. 

3.6.1 Overview of Algorithms 

In this section each of the numerical procedures used in the CARE3 program 
is highlighted. The requirements for the algorithm and its implementation 
are discussed first, followed by a preview of the BCS analysis of the 
algorithm in the CARE-III context. In the succeeding sections a detailed 
description and analysis of each algorithm is provided. 

• Numerical Integration 

The calculation of the unreliability Q(t | J_) for a fault vector J_ 
requires the calculation of the integral of K(t | l ): 

r t 

Q(t |_i) = J K( r|J_) dr (3.6-1) 


where K(t I JJ can be computed from the reliability data, critical 
pairs fault tree(s) data and the output of the coverage model (see 
Section 3.2). The numerical integration procedure is based on 
Simpson's Rule and is implemented in subroutines UNRELQ, SUMMAT and 
FINTGRT. Both Q(t|^_) and K(t \J_) are computed at the (equally 
spaced with stepsize dt) discrete time points for the reliability 
model. 
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BCS has carefully reviewed UNRELQ, FINTGRT and SUMMAT and the sub- 
routines which they use; no programming errors in the implementation 
of Simpson's Rule were detected by the review. However, a program- 
ming error in subroutine FAPC (called by SUMMAT) was discovered; this 
error is discussed in Section 3.5.5. 

•. Numerical Convolution 


The calculation of K(j_|t) for a fault vector j_ requires several 
calculations of the convolution of two functions, P^{t) and P 2 U): 


y(t) 



) P 1 (t - T ) d T 


(3.6-2) 


where P^(t) is the measure of the rate at which a certain class of 
fault occurs and P 2 (r) is a function of the interval, r, between 
that occurrence and the entry of the fault into a particular 
coverage-model state. Each of the output coverage functions, Pp , 

enter into such a convolution calculation 


L’ ' F* P B’ r B a,,u r DF ! 

(see Section 3.2). The numerical convolution procedure is based on 

the method of moments and is implemented in subroutines, FHSFST, 

FHDFST, ABCST, FFSFST and FFDFST. 


P B and 


BCS has carefully reviewed FHSFST, FHDFST, ABCST, FFSFST and FFDFST 
and the subroutines which they use; no programming errors in the 
implementation of the method of moments were detected by the review. 
However, BCS has raised questions about the accuracy of the method of 
moments for calculating the required convolutions; these questions 
are addressed in detail in Section 3.6.3. 


3.6.2 Numerical Integration 

The numerical integration procedure used in the CARE3 program is based on 
Simpson's Rule and is implemented in subroutines UNRELQ, SUMMAT and 
FINTGRT, specifically to compute the integrals: 


102 



(3.6-3) 


Q(tj \J_) 


-5 


K( r I l )d r j = 


1. 2, j max . 


where / is a fault vector and t,; j = 1, 2, .... i „ are equally spaced 

J nlaX 

discrete time points for the reliability model : 


- (j - l)^t ; j - 1, 2,..., 


(3.6-4) 


The function K(t I _l) is evaluated by subroutine SUMMAT (see Section 3.2 
for a description of the calculations involved) and the integrals are 
evaluated as follows by subroutine UNRELQ: 

• Case 1: j = 1 


Q(t 1 lj_)=0. , (3.6-5) 


• Case 2 = j > 1, j even 

Q(t, I l) = Z J F k (3.6-6) 

3 ~ k=2 K 

k even 

• Case 3 = j > 1, j odd 

Q(t. I i ) =E 3 F k (3.6-7) 

3 ~~ k=3 K 

k odd 

where the F k are computed in subroutine FINT6RT: 

• Case 1 : k = 1 


F k = 0. (3.6-8) 
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• Case 2 : k = 2 


F k = “4 — 1-1) + K ( t 2 1-1)) (3.6-9) 

• Case 3 : k > 2 

F k = (K(t k-2 1-1) + X'^k-l 1-1) + K ( t k *-1)) (3.6-10) 

3.6.3 Numerical Convolution 

The numerical convolution procedure used in the CARE3 program is based on 
the method of moments and is implemented in subroutines FHSFST, FHDFST, 
ABCST, FFSFST and FFDFST specifically to compute the convolutions: 

y(t,) = J Po(r) P 1 (t - r )d r , (3.6-11) 

0 o 

where P^(t) is a reliability model function of the form: 

A x (t) : single fault, 

P x (t) = 1 (3.6-12) 

(t) : double fault, 

x i y j 

P 2 (t) is one of the coverage model output functions, P Dp , P L , Pp, p g , Pg 
and P DF , and tjj j = 1, 2, ..., j max are equally spaced discrete time 
points for the reliability model: 

tj - (j - l) At ; j = 1,2,...., j max> (3.6-13) 
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The procedure is based on the critical assumptions that P^(t) is a much 
more slowly varying function of time than is P 2 (t) and that P 2 (t) decays 
rapidly to zero: 

P 2 (r) « 0. : r > t Q >0. , (3.6-14) 

2 

P]_(t-T)«:a(t) + Tb(t) + r c(t) : o<r<t Q , t Q >0. (3.6-15) 


Under these assumptions: 

A 2 

y( t J 0 = J P 2 (r) (a(tj) +rb(tj) + r c(t.))dr, 

t 0 t t 

= a(t i )fp ? (T)dr + b(t.)/ t P 2 (r)dT + c(t.)/ t 
J o J o J o 


(3.6-15) 
2 P 2 (r)dr, (3.6-16) 


= a(tj) M°(t 0 ) + b(tj) M|(t 0 ) + c(t.) M 2 (t o ), 


(3.6-17) 


where 

Mi(t 0 ) =/°T 1 p 2 (r)dr ; i = 0,1,2. 3.6-18 

o 

The time t Q and the coefficients, a(tj), b(tj) and c(tj) are computed by 
subroutine ABCST to make the approximation to Pj(tj -r) exact at t^ - t Q , 
tj " (V 2 ) and ^j* Subroutines FFSFST and FFDFST are called by ABCST to 
compute P^t) for the single or double fault cases, respectively. 
Finally, the approximate values of y(tj) are evaluated by subroutines 
FHSFST and FHDFST for the single or double fault cases, respectively. 

BCS has raised questions about the accuracy of the method of moments, as 
implemented in CARE-III, for computing the convolution in equation 3.6-11. 
Since this convolution provides the crucial links between the coverage 
model and the reliability model, its accurate evaluation is important for 
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the CARE-III estimate of reliability. The assumption that P^(t) is a much 
more slowly varying function of time than is P£(t) is consistent with the 
CARE-III assumption that coverage rates are much higher than module 
failure rates; therefore, this assumption should not degrade the accuracy 
of the solution. 

However, the assumption that P,,(t) decays rapidly to zero may not be 
valid for all coverage model parameters; indeed for some cases P2 ( "T - ) 
decays to a non-zero steady state value. In such a case two sources of 
error develop in the calculation; first, the contribution to the convolu- 
tion for r > t , which is neglected, may become significant and second, 
the approximation to P^(t) becomes less accurate as t becomes larger. In 
the CARE-III implementation this potential source of error is not estimat- 
ed or monitored. 

BCS is currently investigating the impact of this problem on the CARE-III 
reliability estimate by running cases with coverage models that do not 
satisfy the assumptions in equations 3.6-14 to 3.6-15. 
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Section 4 
COVERAGE MODEL 

The theory and implementation of the CARE-III coverage model is described 
in this section. The mathematical details of the coverage model have been 
extracted from the CARE-III documentation, J. J. Stiffler, l. A. Bryant 
and L. Guccione (1979), J. J. Stiffler and L. A. Bryant (1982), J. J. 
Stiffler, J. S. Neumann and L. A. Bryant (1982) and the CARE-III program 
(Version 3, 1982). In those areas where the documentation is vague or 
incomplete, BCS has completed the model specifications based on its 
understanding of the applicable reliability methods. 

Sections 4.1 and 4.2 present the single and double fault coverage models 
and their mathematical solution in terms of a sequence of Volterra 
Integral Equations of the second kind. The implementation of the coverage 
models is described in Sections 4.3 and 4.4, first an overiew of the 
COVRGE program is presented in Section 4.3 and then the computational 
methods used in COVRGE are highlighted in Section 4.4. 

Any discrepencies between the CARE-III documentation or the COVRGE program 
and the coverage model, as found by BCS during the Task 1 review, are 
pointed out in the discussion in Section 4. 
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4.1 STOCHASTIC COVERAGE MODEL 


The two Coverage Models are used to analyze the latency period of a fault 
and the interaction of latent faults in a critical pair. These models are 
described in detail in Sections 3.1.3 and 3.1.4, and are used to derive 
the transition rates for the Aggregate Model as shown in Section 3.4. 

The coverage functions needed in the calculation of these rates are the 
state probabilities for different coverage states and the intensities of 
entry into failure state. 

All the calculations are done within the context of the coverage dynamics, 
independently of the dynamics of the rest of the system, so the 
reversibility argument given in Section 3.1.6 for transient faults (back- 
ward transition in vector JJ is irrelevant for the Coverage Models. The 
functions to be obtained in Section 4.2 are then valid for both transient 
and non-transient faults. 

More specifically the functions needed from each of the Coverage Models 
are as follows: 

• From the Single Fault Coverage. Model, SFCM, and for each fault 
category x^: 

The intensity of entry at time t into failure state F: Pp ( t ) ; and 
the probabilities that at time t, the fault is in 
benign state B: P g (t), 

non-benign stated: P-g-(t), 

latent state L: P^(t), 

detected as permanent state DP: P Dp (t). 
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for a non-transient fault, 
for a transient fault. 

• From the Double Fault Coverage Model, DFCM, and for each pair of 
fault categories x., y,: 

' J 

* 

The intensity of entry at time t into the Double-Fault failure 
state DF: P D p(t) . 

In all the above functions, time t is measured from time of first entry 
into state A for SFCM, and first entry into state A 2 for DFCM. 

Holding times in each state, though not necessarily exponentially distri- 
buted, are independent from past dynamics and so the Coverage Models, SFCM 
and DFCM, are homogeneous semi -Markov processes with respective initial 
States A and BjA 2 . In Section 4.2 it is shown how properties of semi- 

Markov processes - see Appendix - are used to derive the functions that 
are of interest to solve the reliability problem. 


Where B = A or E, 
and L = j B or B 

B 
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4.2 SOLUTION OF COVERAGE MODELS 
4.2.1 Single Fault Coverage Model 

The Single Fault Coverage Model is shown in Figure 4.2-1. Transitions out 
of each state are measured from the time of entry into it, except for the 
two error states. In this case transitions are measured from time of 
entry into the active-error state Ag. The essence of this interpretation 
is that detection schedules are independent of fluctuations between active 
and benign states. The model can be reduced by replacing the two error 
states by one E shown in Figure 4.2-1 as a rectangular block. This 
interpretation although not consistent with the description of the SFCM 
given by J. J. Stiff ler and L. A. Bryant (1982), coincides with the 
present version of CARE III. 

The input parameters ( a , p, e(t), P{ t), e(t), C, P^, Pg) determine 

the transition probability distributions Q - - (t ) . The derivatives of 

• J 

Q..(t) and the holding time distributions h.(t) are given in Table 4.2-1. 

1 J * 

These are then used to calculate the first entry or return distributions 
F..(t) or the entry intensities f..(t), and from these the state 

' J ’ o 

probabilities P..(t). 

The model corresponding to the case P^ = Pg = 1 is simpler and affords an 
easier solution. The general model can then be solved from the simpler 
one as follows. 

Let F x (t) denote the probability of being in state x at time t, given that 
P A = P g = 1 (i.e., given that the faulty element has not yet been returned 
to active state A after possible detection); G x (t) denote the probability 
of the same event but under no restrictions on P^ and Pgj and A ( t ) denote 
the intensity of first return to active state A after detection. 
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A: ACTIVE 
B: BENIGN 

D: DETECTED 

E: ERROR 

F: FAILURE 

DP: DETECTED AS PERMANENT 

(NON-TRANSIENT) 


t = time from entry Into 
active state A 

T = time from entry into 
error state E 


Figure 4.2-1 Single Fault Coverage Model 
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f{t) = 


/3 + a exp(-( a+ 1 3)t) 
a + 1 3 


A (t) = Dirac's delta 





Then it follows that for any state x different from B, 


G x (t) = F x (t) + 


i: A<u 


) G x (t-u) du 


where the integral accounts for all possible cycles through the system 
until ending in state x at time t. 

It also follows that 

A(t) = (1-P A ) ^ A (t) + (l-Pg^ ^ exp du » 

where ^ A (t) and ^ B (t) are ^* ie i ntens '' t i es °f entry into states A Q and 
B q respectively, given P A = P g = 1. 

For the case of state B a modification should be made. Instead of using 
Fg(t), use Fg(t) + X B (t), 

where X g (t) = Probability of entering Bq for the first time and then 
remaining in the benign state until time t 


U-P B ) J 0 B (u) exp j^- >3 (t-u) J du. 


In Table 4.2-2, a summary of the Single-Fault Model Equations is given 
with the corresponding definitions and mathematical expressions. 

As an example the first two formulas are derived. 


0(t) 


is/? - * times the intensity of reentry into state A 
exactly t time units after previous entry. 


and P a (t) is the probability of being in state A at time t, when P A 


P B = *' 
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Using the notation given in the Appendix and the numbering of states in 
Table 4.2-1, these terms are t) and P 1;l (t) respectively. So for 

the first function, 

- fj^(t) -j Q]^(dx) ^21 ^’ x ^* 

where 


Q 12 (t) = a exp(- at) d(t) r(t), 
and f 21 (t) = Q 21 (t) = 0exp(-/9t). 

Hence 


</>( t) = a exp(-;St ) J exp -x(a-/5)J d(x) r(x) dx. 


Similarly for the second function, 


P a (t) = P U (t) = M 4 ) *r F H (dx) P ll (t ' x) 

Jo 

= h^t) +[ f u (x) P n (t-x) dx 
Jo 


= exp(-cet) d(t) r(t) +| </»( t-x) P a (x) dx. 


The mathematical expressions shown in Table 4.2-2 for the functions X B ( t) 
and F x (t) differ from those given by J. J. Stiffler and L. A. Bryant 
(1981), and from those implemented in the present version of CARE-III. 
The suggested changes account for possible returns to the benign state B 
from state B Q with probability 1-P g and thus affect the calculation of the 
probability of being in the benign state at time t: P g (t). The present 
implemented version is only valid if P g = 0. 
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Table 4.2-2 

Single-Fault Model Equations 


FUNCTION MATHEMATICAL EXPRESSION* 


DEFINITION 


<t> (t) ae'^ j e -( ff -£) u r ( u ) d(u) 


du 


U1 


p a (t> 


e' ot r(t) d(t) + & I </, (t-u) P = (u) du 

a 


/: 


pjt) 


r. 


+ 0 <t> (t-u) P b (u) du 


Times the Probabil- 
ity intensity of re- 
entering state A exactly 
t time units after the 
previous entry. 

Probability of being in 
state A at time t when P A 
= p B = i 

Probability of being in 
state B at time t when 



P e (t) 



p (u) d(u) e(t-u) du + /? 



4 > (t-u) P e (u) du 


Probability of being in 
state A^ or at time t 
when P A = P B = 1 


* t Here is a measure of the time since the entry into state A. 








Table 4.2-2 (Continued) 
Single-Fault Model Equations 


FUNCTION MATHEMATICAL EXPRESSION* 


DEFINITION 


* B (t) -f^Pp e (u) (1 ' (t ' u) )<(t-u) du 

* B (t) (1 - P B> [g <V u)e ' /J(t ' U>du 


Intensity of entry into 
state Bp at time t for 
the first time 

Probability of having en- 
tered state Bp for the 
first time and then re- 
maining in the benign 
state until time t 





0 A (u)du + P B 


•t 

^ R (u)du 

JO B 


Probability that a fault 
has been diagnosed as 
permanent by time t 


F x (t) 


M * Jo [(1-P A ) ^ A (t-u) + fiX B (t-u) F x (u)du 


Function relating prob- 
abilities and intensities 
derived when P A = P B = 1 
to those same quantities 
when P A & P B are 
arbitrary 


* t Here is a measure of the time since the entry into state A. 





Table 4.2-2 (Continued) 
Single-Fault Model Equations 



FUNCTION 

MATHEMATICAL EXPRESSION* 

DEFINITION 


p B (t) 

F x (t) with F x (t) = P b (t) + X B (t) 

Probability of being in 
state B at time t 


pg(t) 

F x (t) with F x (t) = P a (t) + P e (t) 

Probability of being in a 
non-benign state at time 
t 

H-* 

00 


P b (t) + X B (t) 



P L (t) 

+ P a (t) + P e (t) 

F x (t) with F x (t) = NON TRANSIENT FAULTS 

P a (t) + P e (t) 

Probability of a latent 
fault or undetected error 
at time t 



TRANSIENT FAULTS 



P D p( t ) 

F x (t) with F x (t) = P dp (t) 

Probability that a fault 
has been diagnosed as 
permanent by time t 


* t Here 

is a measure of the time since the entry into state A. 





4.2.2 Double-Fault Coverage Model 


A detailed version of the DFCM, consistent with the SFCM, is shown in 
Figure 4.2-2. CARE- III considers a simplified version whereby a detected 
as non-permanent fault causes immediate failure of the system, e.g., from 
state Bj D 2 there is an instantaneous transition either to state DP ? 
with probability P/\ 2 or to state DF with probability l-P/\ 2 . This change 
is represented by the dashed lines into state DF. This new model will 
result on a higher intensity of entry into state DF and hence a smaller 
(conservative) value for the Reliability of the system. 

The holding times for this model follow the stochastic characteristics of 
those in the SFCM and so the Double-Fault Model is also a semi -Markov 

I 

process. The transition probability densities, Q^.(t), are given in 
Table 4.2-3. 

Figure 4.2-3 represents the Double Fault Coverage Model as given by J. J. 
Stiff ler and L. A. Bryant (1982). It should be noted that in this 
representation, the parameters for transitions into states F and D do not 
correspond to competing transitions as is the case for such graphical 
representations of semi -Markov processes. 

Using the formulas of the Appendix, and given the functions c^(t), c^(t), 
c^(t) and fj(t) as defined in Table 4.2-4, it follows that 

P DF (t) = c l^ + I f l^ p 4( t-x ) dx > 

Jo 

where p 4 (t), the intensity of entry into state F t units of time after 
entry into state B 2 , is given by 

P/^t) = c 4(t) + j c 3 (x) p 4 (t-x) dx. 

Straightforward analysis, e.g., using Laplace transforms, show that these 
formulas are equivalent to those given in Table 4.2-4. 
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TABLE 4.2-3 Double Fault Coverage Model 
Transition Probability Densities Q..(t) 

* vl 


to 

From 

B 1 A 2 

A 1 B 2 

B 1 B 2 

D 

F 

CVJ 

< 

H 

CQ 

- 

- 

^(t) 

P A 2 5 2 (t)a 2 (t)r 2 (t)b 1 (t) 

c^t) 

A 1 B 2 

- 

- 

f 2 (t) 

P A tf 1 (t)a 1 (t)r 1 (t)b 2 (t) 

c 2 (t) 

5 b 1 b 2 

0 2 (t) b x (t) 

b 2 (t) 

- 

- 

- 

D 

- 

- 

- 

- 

- 

F 

- 

- 

- 

- 

- 


Cj(t), f^(t) as given in table 4.2-4 
£ i (t) = 0. exp(-^.t) b.(t) = exp(-£.t) 

<3* i (t) = a i exp(-a i t) a i (t) =exp(-ce i t) 







v; 


a i 



uble Fault Coverage r'--’-' 1 



Table 4.2-4 

Double-Fault Model Equations 



FUNCTION 

MATHEMATICAL EXPRESSION* 

DEFINITION 

m. 


c . (t ) /?,-(t)d-(t)r.(t)a.(t) + Transition density from state 

I * J J J 

AjB- to state F 

i = 1,2 (l-P Aj )b.(t) i j (t)r j (t)a j (t) + 

j = 3-i b i (tJdjCt) Pj(t)aj(t) 

h-» 
r\3 
OJ 

f.(t) a j(t)b i (t)d J .(t)r j (t) 

i = 1,2 
j = 3-i 

c 4 (t) Jo [c 1 (t-u)y3 2 (u)b 1 (u) + 

c 2 (t-u)>S 1 (u)b 2 (u)J du 


Intensity of entry into state 
F t time units after last 
entry into state B^B 2 


Transition density from state 
AjB^ to state B^B 2 





Table 4.2-4 (Continued) 
Double-Fault Model Equations 


FUNCTION MATHEMATICAL EXPRESSION* 


DEFINITION 


c 3 (t) 


jo [fi(t- u )^ 2 (u)bi(u) + 

f 2 (t-u )^ 1 (u)b 2 (u)J du 


Intensity of re-entry into 
state B^B 2 t time units after 
a previous entry 


no 

-P> 


p 3 (t) 


P DF^) 


f x (t) + 


c x (t) + 



3 (t-u)p 3 (u)du 


c 4 (t-u)p 3 (u)du 


Intensity of last entry into 
state B^B 2 t time units after 
entry into state A 2 B^ 


Intensity of entry into state 
F t time units after entry 
into state 




4.3 IMPLEMENTATION IN CARE-III 

In the previous sections the formulation of the coverage model was 
reviewed; now, the implementation of the model in the program, COVRGE, 
will be described in some detail. The objective of this section is to 
document the model that is actually implemented and outline the calcula- 
tions performed. In the following sections, the overall structure and 
data flow of the COVRGE program are described, the solution of the cover- 
age model is outlined and the basic coverage functions are defined. 

4.3.1 Overview of COVRGE Program 

Figure 4.3-1 illustrates the overall data flow for the COVRGE program. 
The user's input data for the coverage model is read from file CREIN by 
the input program CAREIN; it includes the fault type parameters: a , ft , 

6, p, e , <5 p, Pp and e p. After the data is checked and preprocessed 
by CAREIN, it is passed to COVRGE on file COVIN. Then the coverage model 
is solved for each fault type by COVRGE and the functions p Dp , P L , Pp, P g , 
P^- and p D p are computed. The moments of the coverage functions are 
computed and passed to the reliability program CARE3, on file CVGMTS. In 
addition the coverage functions are passed to the plotting program, 
CVGPLT, on files SNGFL, and DBLFL. 

Figure 4.3-2 provides a high level functional description of the COVRGE 
program; a brief explanation of the functions in the figure follows: 

• Computation Control 

These subroutines control the computations for solving the single and 
double fault models; the details of the computational sequence are given 
in Section 4.3-2. Figures 4.3-3 and 4.3-4 illustrate the control 
structure of subroutines SNGFLT and DBLFLT with call trees; from a control 
point of view the two subroutines are quite similar. 
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• Single and Double Fault Functions 


The subroutines in this group compute the basic distribution and survival 
function for the coverage model and the elementary functions used in the 
solution of the coverage equations; the function definitions are given in 
Section 4.3.3. 

• Numerical Integration 

The subroutines in this group are used to numerically compute the integral 
(over a time interval) of a function. This kind of calculation is 
required in the coverage model solution and for the evaluation of the 
moments of the coverage functions. The numerical methods used in these 
subroutines are discussed detail in Section 4.4. 

• Numerical Convolution 

The subroutines in this group are used to numerically compute the convol- 
ution of two functions and to solve Volterra integral equations of the 
second kind. This kind of calculation is required in the coverage 
solution. These are the most crucial numerical subroutines in the COVRGE 
program; the numerical methods used are discussed in detail in Section 
4.4. 

• Support Functions 

These are "library" type subroutines which are used by all the other sub- 
routines in the program for very basic operations or calculations. 

Figure 4.3-5 illustrates the data structure used in the COVRGE program to 
store all functions of time that are computed during the course of the 
solution of the coverage model; this data structure will be referred to as 
a CARE-III, Type A function array. The figure shows that the discrete 
time array has a special structure: the step size only increases. 
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monotonically by factors of two. This data structure reflects the 
expectation, on the part of the CARE-III developers, that all functions of 
interest rapidly decay to zero or a steady-state value. The impact of 
this data structure on the performance of the numerical software in the 
COVRG program will be discussed in Section 4.4. 
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FIGURE 4.3-1 


Data Flow for COVRGE Program 








COMPFUN 

GENMNTS 

FGSNGL 

VSTPINT 

SFG12 

SIMPINT 

FCDBL 

ARHOMNT 

FFDBL 

SUMARS 

FBDBL 


TMAXSNG 


TMAXDBL 


CNSRTDN 


RTDNINT 



PREVNRC 

FTCHSTP 

VLTNREC 

FILLSNG 

CVLTAR 

FILLDBL 

VOLTERA 

GENTMAR 

CNNVLINT 

FRETVAL 

FCNVLTM 

STPINDX 


FLINTP 


PREEXP 


BUFBLK 


PRNTCVG 


FIGURE 4.3-2 Functional Structure of COVRGE Program 







SNGFLT 


FIGURE 4.3- 


f— COHPFUH 

-FGSNGL 
1 — SF612 
SUMARS 
[ — VSTPINT 
PREVNRC 
L VLTREC 

L CNVLINT 

f— VOLTERA 

* — CNVLINT 


\— CVLTAR 

L 


VOLTERA 

U 


NVLINT 


— 6ENHHTS 
— TMAXSNG 

— BUFBLK 


SNGFLT Call Tree 
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DBLFLT 


1— COMPFUN 

- FCDBL 
FDDBL 
— FFDBL 
— SUMARS 
— PREVNRC 

L 


VLTREC 


CNVLINT 


-VOLTERA 

CNVLINT 
— GENMNTS 
— TMAXDBL 
— BUFBLK 


FIGURE 4.3-4 DBLFLT Call Tree 
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Function: 


y = f (t) 


Discrete Approximation: 

y j = : j 2 ’*** ,J max 



FIGURE 4.3-5 CARE— I II Type-A Function Array 



4.3.2 Outline of Calculations 


The evaluation of the output coverage functions is performed by 
subroutines SNGFLT and D8LFLT under the control of the main program 
COVRGE. For each fault type, SNGFLT performs a series of calculations to 
evaluate all the equations of the single fault coverage model listed in 
Table 4.2-2. Moments of the functions p Q p, P L , p F , P g and Pg are output 
by SNGFLT for later use in the reliability calculation. Similarly, for 
each pair of fault types, DBLFLT performs a series of calculations to 
evaluate all the equations of the double fault coverage model listed in 
Table 4.2-4. Moments of the function p Q p are output by DBLFLT for later 
use in the reliability calculation. 

The calculations done in SNGFLT and DBLFLT are outlined in Tables 4.3-1 
and 2, respectively. The arrays named as inputs or outputs for each cal- 
culation are CARE-III, Type A function arrays (see Figure 4.3-5). The 
calculation types are defined as follows. 

• Function Evaluation (G or F) 

The single fault functions, G^ to G^» are evaluated by subroutines FGSNGL 
and SFG12 and the double fault functions, Fq, F^j Fp^, Fp 2 » F g p f b 2* 
are evaluated by subroutines FCDBL, FFDBL and FBDBL. In each case the 
function is evaluated under the control of subroutine COMPFUN which stores 
the function values as a CARE-III, Type A function array. 

• Summation (SUM) 

The sum of two functions is computed by subroutine SUMARS and stored as a 
CARE-III, Type A function array. SUMARS expects the two input functions 
to be stored as CARE-III, Type A function arrays. 
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• Integration (INT) 


The integral of a function over a time interval is computed by subroutine 
VSTPINT and stored as a CARE-III, Type A function array. VSTPINT expects 
the input function to be stored as a CARE-III, Type A function array. 

• Convolution (CNV ) 

The convolution of two functions is computed by subroutine PREVNRC (and 
VLTNREC) and stored as a CARE-III, Type A function array. PREVNRC expects 
the two input functions to be stored as CARE-III, Type A function arrays. 

t Volterra Integral Equation (VIE) 

The solution of a Volterra integral equation defined by an input function 
and a kernel function is computated by subroutines VOLTERA or CVLTAR and 
stored as a CARE-III, Type a function array. VOLTERA and CVLTAR expect 
the input and kernel functions to be stored as CARE-III, Type A function 
arrays. 

The numerical procedures for computing sums, integrals, convolutions and 
solving Volterra integral equations are discussed in detail in 
Section 4.4. 
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TABLE 4.3-1 


SINGLE FAULT CALCULATIONS 


Function 


Calculation 

Type Subroutine 


Input Arrays Output Arrays 


0 

G 1 

FGSNGL-1 

- 

FEEAR 

P a 

G 2 

FSGNGL-2 

- 

GAR 


VIE 

VOLTERA 

GAR,FEEAR 

PA 

P b 

VIE 

VOLTERA 

FEEAR,FEEAR 

PB1 

P e 

G 3 

FGSNGL-3 

- 

GAR 


G 4 

FGSNGL-4 

- 

PERR 


CNV 

PREVNRC 

0., PEER, GAR 

FAR 


VIE 

VOLTERA 

FAR.FEEAR 

PEER 

Pe 

VIE 


GAR.FEEAR 

PEAR 

p e 

G 5 

FGSNGL-5 


GAR 


VIE 

VOLTERA 

GAR.FEEAR 

PNEAR 

Pf 

G 6 

FGSNGL-6 


GAR 


CNV 

PREVNRC 

0., GAR, PEAR 

PFLD 

*A 

G 7 

FGSNGL-7 

- 

GAR 


CNV 

PREVNRC 

PNEAR, GAR, PEAR 

PSIA 

*B 

G 8 

FGSNGL-8 

- 

GAR 


CNV 

PREVNRC 

0., GAR, PEAR 

PSIB 

X B 

G 9 

FGSNGL-9 

- 

GAR 


CNV 

PREVNRC 

0. ,GAR,PSIB 

PB2 




Function 

Calculation 

Type 

11. 

p dp 

I NT 



INT 



SUM 

12. 

(1 -- p a>V< 1 " p b , ' ! +b 

SUM 

13. 

e 

INT 

14. 

P DP 

VIE 

15. 

P A 

VIE 

16. 

P B1 

VIE 

17. 

P B2 

VIE 

18. 

P E 

VIE 

19. 

P F 

VIE 

• 

O 

CNJ 

P B 

SUM 

21. 

t CO 
O. 

SUM 


22 


SUM 


TABLE 4.3-1 (Continued) 


Subroutine 



VSTPINT PSIA GAR 

VSTPINT PSIB FAR 

SUMARS PS I A, PS IB PDP 

SUMARS PSIA.PSB2 FEEAR 

VSTPINT FEEAR FAR 

CVLTAR PDP POP 

CVLTAR PA p A 

CVLTAR PB1 PBl 

CVLTAR PB2 PB2 

CVLTAR PERR PErr 

CVLTAR PFLD PFLD 

SUMARS PB1.PB2 PBNG 

SUMARS PA, PERR PNBNG 

PBNG.PNBNG 


SUMARS 


PLAT 



TABLE 4.3-2 


DOUBLE FAULT CALCULATIONS 


Calculation 


Function 

Type 

Subroutine 

Input Arrays 

Output Arrays 

c i 

F C1 

FCDBL-1 

_ 

C1AR 

C 2 

F C2 

FCDBL-2 

- 

C2AR 

f l 

f fi 

FFDBL-1 

- 

F1AR 

f 2 

F F2 

FFDBL-2 

- 

F2AR 

b l 

f bi 

FBDBL-1 

- 

B1AR 

b 2 

F B2 

FBDBL-2 

- 

B2AR 

C 4 

CNV 

PREVNRC 

0. ,C1AR,B1AR 

XB1INTG 


CNV 

PREVNRC 

0. ,C2AR,B2AR 

XB2INTG 


SUM 

SUMARS 

XB1INTG,XB2INTG 

C4AR 

C 3 

CNV 

PREVNRC 

0. ,F1AR,B1AR 

XB1INTG 


CNV 

PREVNRC 

0. ,F1AR,B2AR 

XB1INTG 


SUM 

SUMARS 

XB1INTG,XB2INTG 

C3AR 

P 3 

VIE 

VOLTERA 

F1AR,C3AR 

P3AR 

P DF 

VIE 

VOLTERA 

C1AR.C4AR 

PDFAR 


4.3.3 Basic Coverage Functions 


The basic coverage functions are the distribution and survival functions 

for the various fault types and the elementary functions used in the 

evaluation of the equations for the coverage model (see Section 4.2). The 

distribution and survival functions are computed in subroutines CNSRTDN 

and RTDNINT, respectively. The single fault functions, G^ to are 

computed in subroutines FGSNGL and SFG12 and the double fault functions 

F r » , F c , F„ , F d are computed in subroutines FCDBL, FFDBL and 

C 1 L 2 F 1 p 2 B 1 b 2 
FBDBL. 

For a fault type, with parameters a,/?, <5, p, e , <5 p, Pp, and Sp 
(see Section 4.2), the distribution and survival functions are computed as 
follows. 


• Active-to-Benign Transition : 

a. (t) = ae ‘ at 
a(t) = e ‘ at 

• Benign-to-Active Transition : 

j8(t) = 0e -0* 
b(t) » e 

• Fault Detection: 



4.3- 1 

4.3- 2 

4.3- 3 

4.3- 4 

, 0. < t < l./<5 4.3-5 

, t > 1 J6 
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d(t) = 


e“«t 

l.-t<5 

0. 


6 F = .F. 

6 p = .T. , 0. < t < 1./6 
6 p = .T. , t > 1 ./6 


4.3-6 


• Error Generation: 


Pit) = 


pe 

P 

0 . 


-P t 


» P p = • F. 


; p F = .T. , 0. < t < 1 .IP 
1 P p = » t > l./p 


4.3-7 



; P F = 
; P F = 
; P f = 


.F. 

.T. , 0. < t < l./p 
.T. , t > l./p 


4.3-8 


• Error Propagation : 


e(t) = 


ee 

G 

0 . 


Gt 


; e p 


= .F. 


6 r- “ 


.T. , 0. < t < l./e 
.T. , t > l./e 


4.3-9 
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e(t) 



.F. 

.T. , 0. < t < l./e 
.T. , t > l./e 


4.3-10 


For a fault type, with parameters ct , f $ , 5, e f <5^ p ^ Gf . ( see 
Section 4.2), the single fault function to Gj 2 are computed as follows: 

• Gj Function: 

G 1 (t) = <p = G 1q (t), with 0 - a in place of £ 4.3-11 

• G^ Function: 

G 2 (t) = a(t) r(t) d(t) , 4.3-12 


• G^ Function: 

g 3 (t) = a(t) p (t) d(t), 

• G^ Function: 

G 4 (t) = e(t) , 


4.3-13 


4.3-14 


# Gg Function: 

g 5 (t) = a(t) r ( t) <5 (t), 4.3-15 


• Gg Function: 

Gg (t) = e (t), 4.3-16 
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• Gy Function: 


«(t) 


Gy (t) = 


; a +0 = 0 . 


. aa(t)b(t) + /? _ 

e (t)* ._. L' \ ; a + ^ 0 . 

a +>£ 


• Gg Function: 


Gg (t) = e (t) (l.-a(t)b(t)) 


• Gg Function: 


Gg (t) = b(t) 


• G 1q Function: 


G 1q (t) = a e 


at -f t e (t " u) d(u)r(u)du 


• Case 1 : a = 0. 

G 10 = °* 


• Case 2 : a > 0., fl p = .F., p p ■ .F. 


: P + 6 


’10 


! aa(t)tb(t) 

aa(t) b(t )-r(t)d(t) . p + 6 .a 

D +S-S ■ r R 


• Case 3 : a > 0., £ p = .T., p p = .F. and p - J3 i 0. 


t = min (t, l./<5 ) 


4.3- 17 

4.3- 18 

4.3- 19 


4.3- 20 

4.3- 21 

4.3- 22 
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exp 1 


= e -(«+0)t 


4.3-23 


exp 2 

= e a -<P->3)to 

4.3-24 

term 1 

= (1./ (/>-£)) (exp 1 - exp 2) 

4.3-25 

term 2 

= (d/ (p-£) 2 ) (exp 1 - exp 2 (1. 

+ (P-0) t Q )) 



4.3-26 

G 10 ^ 

= a (term 1 - term 2) 

4.3-27 

Case 3b: a > 0. , 

6 p = .T., p p = .F. and p -/? = 0. 


*0 

= min (t, l./d ) 

4.3-28 

G 10 ^ 

= a e " (a+ ^ )t: (t Q - 1/2 6 t Q 2 ) 

4.3-29 

Case 4a: a > 0. , 

dp = .F., d T = .T. and d .-/? / 0. 


l o 

= min (t, l./p ) 

4.3-30 

exp/,1 

= e -(« +£)t 

4.3-31 

exp 2 

= e -(« +£)t e -(d -£)t 0 

4.3-32 

term 1 

= (l./(d -/?)) (exp 1 - exp 2) 

4.3-33 

term 2 

= (P/(d-£) 2 ) (exp 1 - exp 2 (1. + 

( 6 


4.3-34 
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G lo( t ) 

= a (term 1 - term 2) 

4.3-35 

Case 4b: a > 0., 

dp = .F., dy = .T. and d -0 = 0. 


*. 

= min (t, l./p ) 

4.3-36 

G 10 ^ 

= a e “( ct+ £) t (t Q - 1/2 P t Q 2 ) 

4.3-37 

Case 5a: a > 0., 

dp = .T. , Pp = .T. and 0 f 0. 


‘o 

= min (t, l./d , l./p ) 

4.3-38 

exp 1 

= e -(°+0)t e /?t o 

4.3-39 

exp 2 

= e -<« 

4.3-40 

term 1 

= {!./&) (exp 1 - exp 2) 

4.3-41 

term 2 

= ((<5 + P)/£ 2 ) (exp 1 (l.-£t Q ) - exp 2) 




4.3-42 

term 3 

= (2pd//? 3 ) (exp 2 (1 - 0 t Q + 1/2 £ 2 t Q 2 ) 

-exp 1) 



4.3-43 

G 10 ^ 

= (term 1 + term 2 + term 3) 

4.3-44 

Case 5b: a > 0., 

dp = .T., Pp = .T. and /? = 0. 



= min (t, l./d , l./p ) 

4.3-45 
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G 10 (t) =ae‘ at (t Q - 1/2 (6 + p) t Q 2 + 1/3 bp t Q 3 ) 


4.3-46 


• Gjj Function: 


a 


G H W ~ a +0 


a(t) b(t) 


4.3-47 


• G 12 Function: 


0 


G 12 ■ 77J (G 1 (t) ' G 10 (t)) + G 2 M 4.3-48 


For a pair of fault types, with parameters o^, fi 6 
p ., p e • ; (see Section 4.2), the double fault function 

J vl J 

Ff , Fg , Fg are computed as follows. 


® i ; and c^, 



• F r Function: 

L 1 

F C ^ = ^ t ) d j ( t ) r j ( 1: ) a j ( t: ) 

+ (1.- P A .)b.(t) d j(t)rj(t)aj(t) 
J 

+ b^tjd^t) P j ( t ) a j (t) 


• F r Function: 

L 2 

F c (t) = ySj(t)d.(t)r i (t)a i (t) 

+ (1- P A .)bj(t) dj(t)r i (t)a.(t) 
+ bj(t)d.(t) p i (t)a i (t) 


4.3-49 


4.3-50 
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• F f Function: 

_1 

Fp^(t) = *.( t)b j (t)d.(t)r i (t) 4.3-51 

• F c Function: 

h 2 

F F 2 {t) 4.3-52 

• F 0 Function: 



F Bi (t) = 4.3-53 

t F d Function: 

h 

f B 2 (t) = bjCtJ/S.ft) 4.3-54 


8CS has identified a coding error in subroutine FGSNGL for the calculation 
of G 10 (t), Case 5a; in equation 4.3-42 the order of the terms exp 1 and 
exp 2 should be switched. 


145 



4.4 COMPUTATIONAL METHODS 

In the previous section the implementation of the coverage model in the 
COVRGE program was outlined, now, the computational methods used to solve 
the model equations will be reviewed. The objective of this section is to 
document the numerical procedures that are implemented and present the BCS 
evaluation of these algorithms in the CARE- III environment. In the fol- 
lowing sections the numerical procedures are first highlighted and then 
discussed in detail. 

4.4.1 Overview of Algorithms 

In this section each of the numerical procedures used in the COVRGE pro- 
gram is highlighted. The requirements for the algorithm and its implemen- 
tation are discussed first, followed by a preview of the BCS analysis of 
the algorithm in the CARE- III context. In the succeeding sections a de- 
tailed description and analysis of each algorithm is provided. 

• Numerical Sum 


The calculation of several of the functions in the single and double 
fault coverage models (P dp , € , P B , Pg, p l , C 3 and C^) require the 
calculation of the sum of two functions. The summation procedure 
uses linear interpolation and is implemented in subroutine SUMARS. 
It is designed to compute the sum: 

y(t) = C iyi (t) + C 2 y 2 (t), 4.4-1 

for input functions stored in CARE- III, Type A function arrays. The 
output function is stored in a CARE-III, Type A function array which 
has discrete time points selected by the numerical procedure (in 
subroutine SUMARS). 
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BCS has carefully reviewed SUMARS and the subroutines which it uses; no 
programming errors in the implementation of the summation calculation were 
detected by the review (with the exception of some minor questions; see 
Section 4.4.2). BCS has raised questions about several of the heuristics 
used in the summation calculation: step size control and zero detection; 
these questions are addressed in Sections 4.4.2 and 4.4.6. 

• Numerical Integration 

The calculation of several of the functions in the single fault 
coverage model (p^ €) require the calculation of the integral of a 
function. In addition the output of the COVRGE program is the 
moments (for p = 0,1,2) of the single and double fault functions, 

P DP’ P L ,P F’ P B ,P ]3 and P DF* The numerlca1 integration procedure is 
based on Simpson's Rule and is implemented in subroutines VSTPINT and 
SIMPINT. It is designed to compute the integrals: 

y P (t) = /* t r P f(r)dT, P =0,1,2 4.4-2 
* / o 

for input functions which are stored in CARE-III, Type A function 
arrays. The output moment is stored in a CARE-III, Type A function 
array which has the same discrete time points as the input function. 

BCS has carefully reviewed VSTPINT, SIMPINT and all the subroutines which 
they use; no programming errors in the implementation of Simpson's Rule 
were detected by the review. 

• Numerical Convolution 


The calculation of several of the functions in the single and double 
fault coverage models (P e , Pf» #8’ *B’ ^4* ^ requires the 
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calculation of the convolution of two functions. The numerical 
convolution procedure is based on the Trapezoidal Rule and is 
implemented in subroutines PREVNRC, VLTNREC and CNVLINT. It is 
designed to compute the convolution: 

0 (t-r) g (r) dr, 4.4-3 

for input functions which are stored in CARE-III, Type A function 
arrays. The output function is stored in a CARE-III, Type A function 
array which has discrete time points selected by the numerical 
procedure (in subroutine VLTNREC). 

BCS has carefully reviewed PREVNRC, VLTNREC, CNVLINT and the subroutines 
which they use; no programming errors in the implementation of the 
convolution calculation were detected by the review (with the exception of 
some minor questions on CNVLINT; see Section 4.4.4). BCS has raised 

questions about several of the heuristics used in the convolution 
calculations: step size control, zero detection and constant value 

detection; these questions are addressed in Sections 4.4.4 and 4.4.6. 

• Numerical Solution of Volterra Integral Equations 

The solution of Volterra integral equations of the second kind is an 
essential calculation for the single and double fault coverage 
models; such solutions are required to compute P a , P g , p g , p Dp , p A , 

P B1’ P B2* P E’ P F* p 3’ P DF* The numer i ca11 solution procedure is based 
on the Trapezoidal Rule and is implemented in subroutines VOLTERA, 
CVLTAR and CNVLINT. It is designed to solve integral equations of 
the form: 


y(t) = f(t) + & 


r 

•'a 


y(t) = f(t) + 



0 (t-r) y (r) dr. 


4.4-4 
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for input functions which are stored in CARE-III, Type A function 
arrays. The output function is stored in a CARE-III, Type A function 
array which has discrete time points selected by the numerical 
procedure (in subroutine VOLTERA). Subroutine VOLTERA solves 
equation 4.4-4 directly for y(t) and CVLTAR computes the solution 
indirectly by solving a Volterra integral equation for y(t)-f(t). 

BCS has carefully reviewed VOLTERA, CVLTAR, CNVLINT and the subroutines 
which they use; no programming errors in the implementation of the 
Volterra integral equation solution algorithm were detected by the review. 
BCS has raised questions about several of the heuristics used in the 
solution algorithm: step size control, zero detection and constant value 
detection; these questions are addressed in Sections 4.4.5 and 4.4.6. In 
addition, BCS has raised the important question of the numerical stability 
of the solution algorithm; this question is discussed in more detail in 
Section 4.4.5. 

4.4.2 Numerical Sum 


The numerical procedure used for calculating the sum of functions uses 
linear interpolation and is implemented in subroutine SUMARS. The input 
functions must be stored in a CARE-III, Type A function array and the sum 
is computed for a selected set of discrete time points: 

y k = Cj yi(s k ) + C 2 y 2 (s k ); k = l,2,....,k^g^. 4.4-5 

The discrete time points for the sum function are automatically selected 
by SUMARS and the sum function is stored as a CARE-III, Type A function 
array. 

The sum is computed with a step-by-step procedure that is initiated by 
setting: 
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s x - 0. , 


4.4-6 


y l = C l y l^ s l^ + C 2 y 2^ s1 ^ * 4.4-7 

J. L. 

The k n step consists of selecting a step size A t^f or the step and then 
computing the sum at: 

s k+l = s k + 4t k- 4 - 4-8 

*k+l = c l*l<Vl> + c 2* 2 < s k + l> 4 - 4 - 9 

Linear interpolation is used to evaluate the y^ and y 2 functions at the 
discrete time points indicated in the sum in equation 4.4-9. 

The summation procedure is monitored by heuristic controls which determine 
when the sum function is zero or may be truncated, select the stepsize 
A t^ and determine when the sum function cannot be obtained in the 
available space in a CARE- III function array; each of these controls is 
briefly described below. 

The summation procedure is terminated after computing y^ if one of the 
following conditions is met: 

• *k - 2 ■ Vi =y k * °- 

• y^TRUNC (default value = .0001) and s^ > maximum time for y^ 
and y 2 functions, 

• STDYFLG is set true and s^ > FT (the flight time in hours), 

• STDYFLG is set true and the maximum number of step doublings 
have been made. 
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The stepsize for the summation procedure is doubled after computing y k if 
the following condition is met: 

• l y k ~ y k-ll /max <l y k-2l * I y k-l I » l y kD < ZERODF. 

The variable ZERODF (default value = .05), used to control the stepsize 
heuristic, is controlled to obtain the sum function in the available space 
as follows: 

• ZERODF = ZERODF - DIFCHNG 

when the maximum number of step doublings is exceeded, 

• ZERODF = ZERODF + DIFCHNG 

when the maximum number of function values is exceeded. 

In both these cases the entire sum function is recomputed one more time; 
if either re-occurs an error message is displayed and the COVRGE program 
is terminated. 

4.4.3 Numerical Integration 

The numerical integration procedure used in the COVRGE program is based on 
Simpson's Rule and is implemented in subroutines VSTPINT and SIMPINT. The 
input function must be stored in a CARE- III Type A function array and the 
p^ moment of the function is evaluated at the same discrete time points 
as the input function: 

y j = • J ~ 1»2» • • • • »jfljax 4.4-10 

Y j = 3 r P f(T)dr: j = 1,2, J' max ,p = o,l,2 4.4-11 
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Subroutine VSTPINT computes: 


n-1 t t. 

P V f Jm+1 n f 

j = 2~j J r p f(r)dr + J r p f(r)dr , j = 1,2 j 

m=l t 


J m 


where (refer to Figure 4.3-5) 


n-1 


j = 1 + k + Yh "m ■ 1<k < N n> 


m=l 


4.4-12 


4.4-13 


J n = 


1 

1 + 


n = 1 


n-1 


T N_ : n = 2,3...,N__ V 

/—j _ i m * * max 

m = 1 


4.4-14 


and the integrals (over the fixed stepsize intervals) are computed in sub- 
routine SIMPINT via Simpson's Rule. For efficiency, the values of the 
integrals: 


m 


- ( J " H ' 1 p 

) t . T f(r)dr : m = 1,2,— ,N max -l, 


J m 


p 

are saved by VSTPINT and Yj is computed as follows: 

J 


4.4-15 


*5- 


n-i r t 

E i* /Vf(r 


m=l 


)dT 


4.4-16 
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Subroutine SIMPINT computes the integrals: 


I 


P 

m 




r p f(r )dT , 
m 


4.4-17 


where the index j is defined by VSTPINT as follows: 


J m <J < J m+1 


4.4-18 


The integrals are evalued by Simpson's Rule: 


• Case 1: j = j 


m 


£<v 


= 0. 


4.4-19 


• Case 2 : j - j m = 1 


P gW n PM * y ; 

W = 6 \ + 4 \ 2 



‘M + ^ P 


+ y. t? 
J 


4.4-20 


• Case 3 : j-j m > 1, j-j m even 



4.4-21 
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• Case 4: j-J m > 1, j-j m odd 


iv Z* 2 V At 

i= Vi 

) 

+ 1 ^ y j-3 *5-3 + 3y j-2 *5-2 + 3 y j-l *5-1 + y j 

4.4-22 

* Increment i by 2's 


4.4.4 Numerical Convolution 


The numerical convolution procedure used in the COVRGE program approxi- 
mates the convolution integral with a discrete sum based on the Trape- 
zoidal Rule and is implemented in subroutines PREVNRC, VLTNREC and 
CNVLINT. The input functions must be stored in a CARE-III, Type A func- 
tion array and the convolution is computed for a selected set of discrete 
time points: 

- r* 

y k = f{s k> + PJ ^ { V r )9(r)dr ; k=l,2, ...» k max> 
0 

4.4-23 

The discrete time points for the output function are automatically select- 
ed by VLTNREC and the output is stored as a CARE-III, Type A function 
array. 

The convolution is computed with a step-by-step procedure 
initiated by setting: 

that is 
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4.4-24 


y 1 = ftsj) 4.4-25 

The k step consists of selecting a stepsize A t k for the step and then 


computing the convolution integral at 

s k+l = s k + 4.4-26 

Writing the discrete time points for the <p and g functions as the sets: 

T*= {tf r 4.4-27 

V {*1 : 4-4-28 

the discrete time points used for the approximation of the convolution in- 
tegral at s^ are 

T k = T 0 u T g = {h : 4 - 4 - 29 

where 

T *> = { s k+r* j = <Vl- 1 "= i < J «fx}* 4*4-30 

T 9 {*§ : t9 « s k+1 , 1 < j < J'max } ’ 4.4-31 


and the t. in T. are numbered so that the t. are increasing in size. 

J K J 
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It is noted that: 


tj_ = 0 


4.4-31 


t N = s k+l 


4.4-32 


The value of y k+1 is computed by applying the Trapezoidal Rule to evaluate 
the convolution integral at s k+1 (in subroutine (CNVLINT) : 


y k+l = + 

N-l 

n=2 

Linear interpolation is used to evaluate the f, 0 and g functions at the 
discrete time points indicated in the sum in equation 4.4-33. 



The convolution procedure is monitored by heuristic controls which 
determine when the convolution is zero or may be truncated, select the 
stepsizes A t k and determine when the convolution cannot be obtained in 
the available space in a CARE-III, Type A function array; each of these 
controls is briefly described below. 


The convolution procedure is terminated after computing y k if one of the 
following conditions is met: 


• y k <0., 

• y k <TRUNC (default value = .0001) and maximum number of step 
doublings have been made, 
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|y k -y k _2 I / max ( y k-2 ,y k-l ,y k^ ^ STDYDIF 




(default value = .0005) and maximum number of step doublings have 
been made. 

The stepsize for the convolution procedure is doubled after computing y k 
if the following condition is met: 

• l y k" y k-l I / max ( y k-2’ y k-l’ y k) < ZER0DF * 

The variable ZERODF (default value = .05), used to control the stepsize 
doubling heuristic, is controlled to obtain the convolution in the 
available space as follows: 

• ZERODF = ZERODF - DIFCHNG; 

when the maximum number of step doublings is exceeded, 

• ZERODF = ZERODF + DIFCHNG; 

when the maximum number of function values is exceeded. 

In both these cases the entire convolution is recomputed one more time; if 
either case re-occurs an error message is displayed and the COVRGE program 
is terminated. 

4.4.5 Numerical Solution of Volterra Integral Equation 

The numerical procedure used in the COVRGE program to solve Volterra in- 
tegral equations of the second kind is based on the numerical convolution 
procedure described in Section 4.4.3 and is implemented in subroutine 
VOLTERA and CNVLINT. The input functions must be stored in a CARE-III, 
Type A function array and the solution is computed for a selected set of 
discrete time points: 

y k = f(s k ) + & f k 0 (s k -r)y( r)dT ; k=l,2 k max> 4.4-34 
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The discrete time points for the solution are automatically selected by 
VOLTERA and the solution is stored as a CARE-III, Type A function array. 

The solution is computed with a step-by-step procedure that is initiated 
by setting: 


= 0. 4.4-35 

= f(s x ) 4.4-36 


*th 

The k step consists of selecting a stepsize A t^ for the step and then 
solving the integral equation at 


s k+l " s k + At k’ 


4.4-37 


First the set of discrete time points needed for approximation of the con- 
volution integral at s^ is selected in the manner outlined in Section 
4.4.4: 



and it is noted that: 


*1 = s i = 4.4-39 

= s k , for some M< N 4.4-40 



4.4-41 


158 



For o < t < t M , y(t) can be estimated by linear interpolation from the 
known values, y at the discre1:e tiroes s^Sg*..^. For 

t M < t< t N , y(t) must be estimated by linear interpolation of the values 
of y at t^ and t^: 


v* 


t-t, 


M 


y(t) = y(t M )”f~r + y(t N )lTt 


4.4-42 


= y 


t N- t 
k t N -t M 


t ~ t M 

y k+l t N -t M 


4.4-43 


The value of y^ is computed by applying the Trapezoidal Rule to evaluate 
the convolution integral at s^ (in subroutine CNVLINT): 


y k+l = y l + ** ( t N" t N-l^^ t l^ y k+l 

N-l -| 

+ E * y <‘n> 

n=2 J 
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y k+l ’ + & 


M 


y l * h (VW^V Vl 


+ Z , * ‘WW^W y <V 

n=2 


N-l 


t M -t 
N n 


t-t„ 
n M 


^ = M+l 15 (t "+l" t n-l>* ( V t i.> ^ y k t N -t H + y k+l t„-t M 


4.4-45 


[■ 


f ( V + P [* t 2 * ( V *i + E ** (VrVii^VV y (* n > 

n=2 


N-l 


*k+l 


+ yk ^ * ^n+rVi’^VV t N -t M 

n = M+l N M J 


1- iff 


N-l 

2 

n = M+l 


t -t M 
n M 


h ^nn-Vil^V 4 .! "tjjTtJJ - *« IVU^V 


4.4-46 


Linear interpolation is used to evaluate the f, 0 and y functions at the 
discrete time points indicated in equation 4.4-46. 
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The solution procedure is monitored by heuristic controls which determine 
when the solution is zero or may be truncated, select the stepsizes 4t k 
and determine when a solution cannot be obtained in the available space in 
a CARE-III, Type A function array; each of these controls is briefly 
described below. 

The solution procedure is terminated after computing y k if one of the 
following conditions is met: 

-293 

• y k < REALMIN (default value =10 ), 

• y k < TRUNC (default value = .0001), 

and s k > maximum time for <f> function, 

§ y k < TRUNC (default value = .0001), 

and maximum number of step doublings have been made, 

• STDYFLG is set true and s k > FT (the flight time in hours), 

• STDYFLG is set true and the maximum number of step doublings have 
been made. 

The stepsize for the solution procedure is doubled after computing y k if 
one of the following conditions is met: 

• IV i 'k-2|/ max <*k-2-*k-l' y k> £ ZER0DF; 

The stepsize is doubled if y(t) has not had a relative maxima or 
minima in IHLDDUB steps. In addition the control flag STDYFLG is 
set to true if y(t) does not have a relative maxima or minima at 

V 


• | y k' y k-ll / max (y k-2’ y k-l’ y k } - ZER0DF; 

The stepsize is doubled if y(t) has not had a relative maxima or 
minima in IHLDDUB steps. 
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The variable ZERODF (default value = .05), used to control the stepsize 

doubling heuristic, is controlled to obtain the solution in the available 
space as follows: 

• ZERODF = ZERODF - DIFCHNG; 

when the maximum number of step doublings is exceeded, 

• ZERODF = ZERODF + DIFCHNG; 

when the maximum number of function values is exceeded and the 

last value of y is less than 1. , 

• ZERODF = ZERODF - DIFCHNG; 

when the maximum number of function values is exceeded and the 

last value of y is greater than 1 . 

In both these cases the entire solution is recomputed one more time; if 
either case re-occurs an error message is displayed and the COVRGE program 
is terminated. 
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A. APPENDIX 


A.O MARKOV AND SEMI-MARKOV PROCESSES 

This appendix gives brief descriptions and some properties of Markov and 
Semi-Markov processes. References on these topics are Parzen (1967), 
Feller (1968), Ross (1970) and Cinlar (1975). 

Consider a physical system that moves from one state to another with ran- 
dom sojourn times in between. Assume that the number of possible states 
is finite, and the paths that follow the state of the system are right 
continuous and piecewise constant: 

i.e., if X(0), X(l), ... are the successive states visited; 

0 = T(0)<T(1)< ... are the successive times of jump; and Y(t) is the 
state of the system at the time t, then the system starts in state X(0) at 
time T(0)=0, and remains there until time T(l), 

Y(t) = X ( 0) for T(0) < t < T(l) ; 

at time T(l) it jumps to state X(l) and remains there until time T(2), 

Y(t) = X(l) for T( 1) <t <T(2); and so on. 

In general 

Y(t) = X(n) for T(n) < t < T(N+1). 

Equivalently, 

X(n) = Y(T(n)), and 

T(n) = inf |t>T(n-l) / Y(t) t Y(T(n-l))}.. 
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The processes Y= |Y t : t>o| and ( X , T) = |(X n ,T n ) :n>o| are "equivalent" in 
the sense that they give the same information. 

The problem is to determine the state probabilities 
P ij (t) = P [ Y(t) = j 1 Y(0) = *] 

under some assumptions on the stochastic model (time homogeneity, indepen- 
dence from past history, distributions of sojourn times). 

The following notation will be used 

Q i0 .(t) = P[x(n+1) = j , T(n+1) - T(n)< t | X(n) = i] , 

Q ij = P [ x ( n+1 ) = J I x (") = i] = QijC 00 ) 

Gij(t) = P Jt ( n+1 ) - T(n)<t | X(n) = i, X(n+l)=j] = Q 1j (t)/Q 1j . 

A.l MARKOV PROCESSES 

Definition Y = ^ Y ( t ) : t>o jis said to be a Markov process if 

p[y ( t+s ) = j | Y(u) : u<t] = P [V ( t+s ) = j | Y(t)] 

for all t, s>o. I.e., the future of the process is independent of its 
past provided that the present state is known. 

If P-f j(s,t) is used to denote the probability that the system is in state 
j at time t, given that it was in state i at time s (s<t), then these 
satisfy the Chapman-Kolmogorov equations, 

p ij(s,t) =Ep ik (s,u) P kj (u,t) for s<u<t . 
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Assumptions 


(a) For every state i there is a non-negative continuous function 
\.(t) such that 

[l - P-j -j ( t, t+h)J/h tends toX^t) as h tends to 0. 

(b) For each pair of distinct states i, j, there corresponds tran- 
sition probabilities q..(t) such that 

* J 

P..(t, t+h)/h tends toX-(t) q..(t) as h tends to 0. 

I J ' • J 

The functions q..(t) are continuous in t, equal to zero when i = j , and for 

' vl 

each fixed i add up to one. 


The state probabilities are then evaluated using either the forward or 
backward equations: 

(1) forward equations 


where 


d P ij (s,t) = -P^-fs.t) Xj(t) + Z P ik (s,t)X k (t) q kj (t) 
crt k ^ j 

P -j j(Sjt) =5^ exp • “ f A j (t) - A j(s)] | + 

X) | P ik (s,u) X k (u) q kj -(u) exp {-[ A j (t) - A j (u >]| du - 

A (t)- f* 
j Jo 


Xj(u) du. 


(2) backward equations 



p ij (s,t) = 5 i j-exp j-[A,(t) -A,(.)]|* 

S f ex P |- [A^u) -A^s)] |x i (u) q ik (u 

k?£ i J S V 1C7 ' 


) P k j(u,t) du. 



A. 2 HOMOGENEOUS MARKOV PROCESSES 


If in the definition of a Markov process it is also assumed that transi- 
tion probabilities P. .(s,t) depend on the times s and t only through their 

• J 

difference t-s, then the process is said to be homogeneous and the term 
P-.(t-s) is used instead. 

* J 

For this case the functions . and q. . are independent of time and so will 

I I J 

be written without reference to that parameter. Such processes satisfy 
the following properties: 

(1) p[x(n+l) = j; T(n+1)-T(n) > t|X(o)... X(n), T(o),..., T(n)] = 

= exp ( -X-t ) when X(n). = i. 

Where q..>o, q.. = o, .=1 and 0<\.<oo 

1J II j ' J * 

The state i is said to be absorbing, stable or instantaneous depending on 
whether X^o, o<X 1 - < <x> , orX^ = co . 

(2) X = XX(n) : n^oi is a Markov chain with transition matrix Q = 

M 

(3) The times between jumps are conditionally independent given the 
successive states being visited, and each sojourn time is expo- 
nentially distributed with parameter dependent on the state 
being visited, i.e., 

p[t(1)-T(o)>u 1 , .... T(n)-T(n-l)>u n _ 1 | X(o)=i Q x ( n ) =1 ' n ] = 



The state probabilities are evaluated using either the forward or backward 
equations, e.g., the forward equation becomes 



A. 3 SEMI -MARKOV PROCESSES 


Definition : Y = |Y(t) : t>oj- is said to be a Semi-Markov process if for 

any s and any time of jump T (T=T(o) or T(l) or ...), 

p[y(T+s) = j | Y(u) , u<T] = p[y(T+s) = j | Y(T)]. 

i.e., the future and the past are conditionally independent given the 
present if the present is a time of transition. 

Such a process has the following properties: 


(1) X = |X(n) : n>oj is a Markov Chain with transition probability 
matrix Q = 

(2) Sojourn times are conditionally independent but their distribu- 
tions depend both on the state being visited and the next one. 
Also these distributions are arbitrary (not necessarily exponen- 
tial). i.e., 

p[t( 1) - T(o) <u x T(n) - T(n-l) < u n | X(o), .... X(n)] = 

= G ( X ( o) , X(l) , u x ) ... G(X(n-l) , X(n), u n ). 


Notation 


N.(t) = number of transitions into state j in 

J 



R-jj(t) = 5 ij . + E [Nj(t) | Y(o)=i ] 


= expected number of visits to 
state j in £o,tJ given that the 
initial state is i; 
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F ij (t) = P [ N J (t) >0 1 Y(o)=i ] 


distribution of time between an 
entry to state i and the first 
next entry to j. 


f 1j “ Tt ■ j£ # T* b< t+h »°’ N j (t) - 0 I v(0,=1 ] = 

= Intensity of entry into state j at time t, given that the 
initial state is i. 

Properties 

(5) R 1d (t) - 5 fJ + S j% 1k (0s) Ry(t-s) 

F ( j(t) - Qjj(t) + E £ Q ik (ds) Fy(t-S) 

(4) P 1;) (t) = ^ R 1 j(ds) hj(t-s) 

= 5 1j h i<‘> qik<tiS, P|< j (t " S) 

" 5 ij h i(‘> + F ij' ds > 

where 

h i (t) = 1 - EQ 1k (t) = p[T(n+l)-T(n)>t | X(n)=i ] . 
k 
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